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CODING SEQUENCE POLYMORPHISMS IN 
VASCULAR PATHOLOGY GENES 

RELATED APPLICATIONS 

This application is a Continuation-in-Part of U.S. Application No. 09/054,272, 
5 filed April 1 , 1998, the contents of which are incorporated herein in their entirety by 
reference. 

BACKGROUND OF THE INVENTION 

The genomes of all organisms undergo spontaneous mutation in the course of 
their continuing evolution, generating variant forms of progenitor sequences (Gusella, 

10 Ann. Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an 
evolutionary advantage or disadvantage relative to a progenitor form or may be 
neutral. In some instances, a variant form confers a lethal disadvantage and is not 
transmitted to subsequent generations of the organism. In other instances, a variant 
form confers an evolutionary advantage to the species and is eventually incorporated 

1 5 into the DNA of many or most members of the species and effectively becomes the 
progenitor form. In many instances, both progenitor and variant foim(s) survive and 
co-exist in a species population. The coexistence of multiple forms of a sequence 
gives rise to polymorphisms. 

Several different types of polymorphism have been reported. A restriction 

20 fragment length polymorphism (RFLP) Is a variation in DNA sequence that alters the 
length of a restriction fragment (Botstein et ai, Am. J. Hum. Genet 32, 314-331 
(1980)). The restriction fragment length polymorphism may create or delete a 
restriction site, thus changing the length of the restriction fragment. RFLPs have been 
widely used in human and animal genetic analyses (see WO 90/13668; W090/1 1369; 

25 Donis-Keller, Cell 51, 319-337 (1987); Lander et al., Genetics 121, 85-99 (1989)). 
When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in 
an individual can be used to predict the likelihood that the animal will also exhibit the 
trait. 

Other polymorphisms take the form of short tandem repeats (STRs) that 
30 include tandem di-, tri- and tetra-nucleotide repeated motifs. These tandem repeats 
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are also referred to as variable number tandem repeat (VNTR) polymorphisms. 
VNTRs have been used in identity and paternity analysis (US 5,075,217; Armour et 
al., FEB S Lett. 307, 113-115 (1992); Horn et aL, WO 91/14003; Jeffreys, EP 
370,719), and in a large number of genetic mapping studies. 
5 Other polymorphisms take the form of single nucleotide variations between 

individuals of the same species. Such polymorphisms are far more frequent than 
RFLPs, STRs and VNTRs. Some single nucleotide polymorphisms (SNP) occur in 
protein-coding sequences (coding sequence SNP (cSNP)), in which case, one of the 
polymorphic forms may give rise to the expression of a defective or otherwise variant 

10 protein and, potentially, a genetic disease. Examples of genes in which 

polymorphisms within coding sequences give rise to genetic disease include p-globin 
(sickle cell anemia), apoE4 (Alzheimer's Disease), Factor V Leiden (thrombosis), and 
CFTR (cystic fibrosis). cSNPs can alter the codon sequence of the gene and therefore 
specify an alternative amino acid. Such changes are called "missense" when another 

15 amino acid is substituted, and "nonsense" when the alternative codon specifies a stop 
signal in protein translation. When the cSNP does not alter the amino acid specified 
the cSNP is called "silent". 

Other single nucleotide polymorphisms occur in noncoding regions. Some of 
these polymorphisms may also result in defective protein expression (e.g., as a result 

20 of defective splicing). Other single nucleotide polymorphisms have no phenotypic 
effects. 

Single nucleotide polymorphisms can be used in the same manner as RFLPs 
and VNTRs, but offer several advantages. Single nucleotide polymorphisms occur 
with greater frequency and are spaced more uniformly throughout the genome than 

25 other forms of polymorphism. The greater frequency and uniformity of single 
nucleotide polymorphisms means that there is a greater probability that such a 
polymorphism will be found in close proximity to a genetic locus of interest than 
would be the case for other polymorphisms. The different forms of characterized 
single nucleotide polymorphisms are often easier to distinguish than other types of 

30 polymorphism (e.g., by use of assays employing allele-specific hybridization probes 
or primers). 

Only a small percentage of the total repository of polymorphisms in humans 
and other organisms has been identified. The limited number of polymorphisms 
identified to date is due to the large amount of work required for their detection by 
35 conventional methods. For example, a conventional approach to identifying 

polymorphisms might be to sequence the same stretch of DNA in a population of . 
individuals by dideoxy sequencing. In this type of approach, the amount of work 
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increases in proportion to both the length of sequence and the number of individuals 
in a population and becomes impractical for large stretches of DNA or large numbers 
of persons. 

SUMMARY OF THE INVENTION 
5 Work described herein pertains to the identification of polymorphisms which 

can predispose individuals to disease, particularly vascular pathologies, by 
resequencing large numbers of genes in a large number of individuals. Eighteen 
genes in a minimum of 30 individuals have been resequenced as described herein, and 
92 SNPs have been discovered (see the Table). Forty of these SNPs are cSNPs which 
10 specify a different amino acid sequence, while 49 of the SNPs are silent cSNPs. 
Three of the SNPs were located in non-coding regions. 

The invention relates to a gene which comprises a single nucleotide 
polymorphism at a specific location. In a particular embodiment the invention relates 
to the variant allele of a gene having a single nucleotide polymorphism, which variant 
15 allele differs from a reference allele by one nucleotide at the site(s) identified in the 
Table. Complements of these nucleic acid segments are also included. The segments 
can be DNA or RNA, and can be double- or single-stranded. Segments can be, for 
example, 5-10, 5-15, 10-20, 5-25, 10-30, 10-50 or 10-100 bases long. 

The invention further provides allele-specific oligonucleotides that hybridize 
20 to a gene comprising a single nucleotide polymorphism or to the complement of the 
gene. These oligonucleotides can be probes or primers. 

The invention further provides a method of analyzing a nucleic acid from an 
individual. The method determines which base is present at any one of the 
polymorphic sites shown in the Table. Optionally, a set of bases occupying a set of 
25 the polymorphic sites shown in the Table is determined. This type of analysis can be 
performed on a number of individuals, who are tested for the presence of a disease 
phenotype. The presence or absence of disease phenotype is then correlated with a 
base or set of bases present at the polymorphic site or sites in the individuals tested. 

BRIEF DESCRIPTION OF THE DRAWINGS 
30 Figures 1 A- 1 C are a table illustrating the locations of single nucleotide 

polymorphisms of various genes. 

Figure 2 is a listing of the genes from Figures 1 A-C with their corresponding 
GenBank Accession numbers and the nucleotide position within that sequence at 
which the single nucleotide polymorphism is located. 
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Figures 3 A-B are a listing of the nucleotide sequence corresponding to 
GenBank Accession number D10202 for the gene PTAFR. 

Figures 4A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number D29832 for the gene AT3. 
5 Figures 5A-C are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number D38081 for the gene TBXA2R. 

Figures 6A-C are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number J02703 for the gene ITGB3. 

Figures 7A-C are a listing of the nucleotide sequence corresponding to the 
10 GenBank Accession number J02764 for the gene ITGA2B. 

Figures 8A-F are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number J02846 for the gene F3. 

Figures 9 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number J02898 for the gene CETP. 
1 5 Figures 1 OA-B are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number J03225 for the gene TFPL 

Figures 1 1 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number K02059 for the gene PROC. 

Figure 12 is a listing of the nucleotide sequence corresponding to the GenBank 
20 Accession number L00336 for the gene LDLR. 

Figure 13 is a listing of the nucleotide sequence corresponding to the GenBank 
Accession number L00338. 

Figure 14 is a listing of the nucleotide sequence corresponding to the GenBank 
Accession number L00343 for the gene LDLR. 
25 Figure 15 is a listing of the nucleotide sequence corresponding to the GenBank 

Accession number L00344 for the gene LDLR. 

Figure 16 is a listing of the nucleotide sequence corresponding to the GenBank 
Accession number L00345 for the gene LDLR. 

Figure 17 is a listing of the nucleotide sequence corresponding to the GenBank 
30 Accession number L00347 for the gene LDLR. 

Figure 1 8 is a listing of the nucleotide sequence corresponding to the GenBank 
Accession number L00349 for the gene LDLR. 

Figures 19 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number L0035 1 for the gene LDLR. 
35 Figures 20A-B are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number L29401 for the gene LDLR. 
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Figures 21 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number L32765 for the gene F5. 

Figures 22A-C are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number Ml 1058 for the gene HMGCR. 
5 Figures 23 A-F are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number Ml 1228 for the gene PROC. 

Figures 24A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number Ml 2625 for the gene LCAT. 

Figures 25A-C are a listing of the nucleotide sequence corresponding to the 
1 0 GenBank Accession number M 1 2849 for the gene HCF2. 

Figures 26A-E are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M14335 for the gene F5. 

Figures 27A-C are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number Ml 5856 for the gene LPL. 
1 5 Figures 28 A-N are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number Ml 7262 for the gene F2. 

Figures 29A-C are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M203 1 1 for the gene ITGB3. 

Figure 30 is a listing of the nucleotide sequence corresponding to the GenBank 
20 Accession number M2 1 645 for the gene AT3. 

Figures 31 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M22569 for the gene ITGA2B. 

Figures 32 A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M30185 for the gene CETP. 
25 Figures 33 A-H are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number M33320 for the gene ITGA2B. 

Figures 34A-G are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M58600 for the gene HCF2. 

Figures 35A-B are a listing of the nucleotide sequence corresponding to the 
30 GenBank Accession number M62424 for the gene F2R. 

Figures 36A-C are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number M76722 for the gene LPL. 

Figures 37A-B are a listing of the nucleotide sequence corresponding to the 
GenBank Accession number U59436 for the gene LDLR. 
35 Figures 38A-B are a listing of the nucleotide sequence corresponding to the 

GenBank Accession number Z22555 for the gene CLanalog. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to a gene which comprises a single nucleotide 
polymorphism (SNP) at a specific location. The gene which includes the SNP has at 
least two alleles, referred to herein as the reference allele and the variant allele. The 
5 reference allele (prototypical or wild type allele) has been designated arbitrarily and 
typically corresponds to the nucleotide sequence of the gene which has been deposited 
with GenBank under a given Accession number. The variant allele differs from the 
reference allele by one nucleotide at the site(s) identified in the Table. The present 
invention also relates to variant alleles of the described genes and to complements of 

10 the variant alleles. The invention further relates to portions of the variant alleles and 
portions of complements of the variant alleles which comprise (encompass) the site of 
the SNP and are at least 5 nucleotides in length. Portions can be, for example, 5-10, 
5-15, 10-20, 5-25, 10-30, 10-50 or 10-100 bases long. For example, a portion of a 
variant allele which is 5 nucleotides in length includes the single nucleotide 

15 polymorphism (the nucleotide which differs from the reference allele at that site) and 
four additional nucleotides which flank the site in the variant allele. These 
nucleotides can be on one or both sides of the polymorphism. Polymorphisms which 
are the subject of this invention are defined in the Table with respect to the reference 
sequence deposited in GenBank under the Accession number indicated. For example, 

20 the invention relates to a portion of a gene (e.g., AT3) having a nucleotide sequence 
as deposited in GenBank (e.g., M21645) comprising a single nucleotide 
polymorphism at a specific position (e.g., nucleotide 100). The reference allele for 
AT3 is shown in column 15 and the variant allele is shown in column 17 of the Table. 
The nucleotide sequences of the invention can be double- or single-stranded. 

25 The invention further provides allele-specific oligonucleotides that hybridize 

to a gene comprising a single nucleotide polymorphism or to the complement of the 
gene. These oligonucleotides can be probes or primers. 

The invention further provides a method of analyzing a nucleic acid from an 
individual. The method determines which base is present at any one of the 

30 polymorphic sites shown in the Table. Optionally, a set of bases occupying a set of 
the polymorphic sites shown in the Table is determined. This type of analysis can be 
performed on a number of individuals, who are tested for the presence of a disease 
phenotype. The presence or absence of disease phenotype is then correlated with a 
base or set of bases present at the polymorphic site or sites in the individuals tested. 



WO 99/50454 



-7- 



PCT7US99/06473 



DEFINITIONS 

An oligonucleotide can be DNA or RNA, and single- or double-stranded. 
Oligonucleotides can be naturally occurring or synthetic, but are typically prepared by 
synthetic means. Preferred oligonucleotides of the invention include segments of 
5 DNA, or their complements, which include any one of the polymorphic sites shown in 
the Table. The segments can be between 5 and 250 bases, and, in specific 
embodiments, are between 5-10, 5-20, 10-20, 10-50, 20-50 or 10-100 bases. The 
polymorphic site can occur within any position of the segment. The segments can be 
from any of the allelic forms of DNA shown in the Table. 

10 As used herein, the terms "nucleotide" and "nucleic acid" are intended to be 

equivalent. The terms "nucleotide sequence", "nucleic acid sequence", "nucleic acid 
molecule" and "segment" are intended to be equivalent. 

Hybridization probes are oligonucleotides which bind in a base-specific 
manner to a complementary strand of nucleic acid. Such probes include peptide 

15 nucleic acids, as described in Nielsen et aL 9 Science 254, 1497-1500 (1991). Probes 
can be any length suitable for specific hybridization to the target nucleic acid 
sequence. The most appropriate length of the probe may vary depending upon the 
hybridization method in which it is being used; for example, particular lengths may be 
more appropriate for use in microfabricated arrays, while other lengths may be more 

20 suitable for use in classical hybridization methods. Suitable probes and primers can 
range from about 5 nucleotides to about 30 nucleotides in length. For example, 
probes and primers can be 5, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 25, 26, 28 or 30 
nucleotides in length. The probe or primer preferably contains at least one 
polymorphic site occupied by any of the possible variant nucleotides. The nucleotide 

25 sequence can correspond to the coding sequence of the allele or to the complement of 
the coding sequence of the allele. 

As used herein, the term "primer" refers to a single-stranded oligonucleotide 
which acts as a point of initiation of template-directed DNA synthesis under 
appropriate conditions (e.g., in the presence of four different nucleoside triphosphates 

30 and an agent for polymerization, such as, DNA or RNA polymerase or reverse 

transcriptase) in an appropriate buffer and at a suitable temperature. The appropriate 
length of a primer depends on the intended use of the primer, but typically ranges 
from 15 to 30 nucleotides. Short primer molecules generally require cooler 
temperatures to form sufficiently stable hybrid complexes with the template. A 

35 primer need not reflect the exact sequence of the template, but must be sufficiently 
complementary to hybridize with a template. The term primer site refers to the area 
of the target DNA to which a primer hybridizes. The term primer pair refers to a set 
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of primers including a 5' (upstream) primer that hybridizes with the 5' end of the DNA 
sequence to be amplified and a 3 ! (downstream) primer that hybridizes with the 
complement of the 3 ! end of the sequence to be amplified. 

As used herein, linkage describes the tendency of genes, alleles, loci or genetic 
5 markers to be inherited together as a result of their location on the same chromosome. 
It can be measured by percent recombination between the two genes, alleles, loci or 
genetic markers. 

As used herein, polymorphism refers to the occurrence of two or more 
genetically determined alternative sequences or alleles in a population. A 

10 polymorphic marker or site is the locus at which divergence occurs. Preferred 

markers have at least two alleles, each occurring at frequency of greater than 1%, and 
more preferably greater than 10% or 20% of a selected population. A polymorphic 
locus may be as small as one base pair. Polymorphic markers include restriction 
fragment length polymorphisms, variable number of tandem repeats (VNTR's), 

15 hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, 
tetranucleotide repeats, simple sequence repeats, and insertion elements such as Alu. 
The first identified allelic form is arbitrarily designated as the reference form and 
other allelic forms are designated as alternative or variant alleles. The allelic form 
occurring most frequently in a selected population is sometimes referred to as the 

20 wildtype form. Diploid organisms may be homozygous or heterozygous for allelic 
forms. A diallelic or biallelic polymorphism has two forms. A triallelic 
polymorphism has three forms. 

Work described herein pertains to the resequencing of large numbers of genes 
in a large number of individuals to identify polymorphisms which can predispose 

25 individuals to disease, particularly vascular pathologies. Eighteen genes in a 

minimum of 30 individuals have been resequenced as described herein, and 92 SNPs 
have been discovered (see the Table). Forty of these SNPs are cSNPs which specify a 
different amino acid sequence, while 49 of the SNPs are silent cSNPs. Three of the 
SNPs were located in non-coding regions. 

30 The 18 genes which were subjected to analysis encode proteins that are 

involved in biochemical pathways that regulate blood coagulation, lipid metabolism, 
and platelet and endothelial cell function. Polymorphisms in all 18 genes are 
candidates for genetic factors that influence the pathophysiology of the blood and 
blood vessels and thus can be relevant to the genetic risk of cardiovascular diseases. 

35 The identified polymorphisms can also be relevant to other disease categories. 

By altering amino acid sequence, SNPs may alter the function of the encoded 
proteins. The discovery of the SNP facilitates biochemical analysis of the variants 
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and the development of assays to characterize the variants and to screen for 
pharmaceutical that would interact directly with on or another form of the protein. 
SNPs (including silent SNPs) may also alter the regulation of the gene at the 
transcriptional or post-transcriptional level. SNPs (including silent SNPs) also enable 
5 the development of specific DNA, RNA, or protein-based diagnostics that detect the 
presence or absence of the polymorphism in particular conditions. 

A single nucleotide polymorphism occurs at a polymorphic site occupied by a 
single nucleotide, which is the site of variation between allelic sequences. The site is 
usually preceded by and followed by highly conserved sequences of the allele (e.g., 

10 sequences that vary in less than 1/100 or 1/1000 members of the populations). 

A single nucleotide polymorphism usually arises due to substitution of one 
nucleotide for another at the polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another pyrimidine. A transversion is 
the replacement of a purine by a pyrimidine or vice versa. Single nucleotide 

1 5 polymorphisms can also arise from a deletion of a nucleotide or an insertion of a 
nucleotide relative to a reference allele. Typically the polymorphic site is occupied by 
a base other than the reference base. For example, where the reference allele contains 
the base "T" at the polymorphic site, the altered allele can contain a "C", "G" or "A" at 
the polymorphic site. 

20 Hybridizations are usually performed under stringent conditions, for example, 

at a salt concentration of no more than 1 M and a temperature of at least 25 °C. For 
example, conditions of 5X SSPE (750 mM NaCl, 50 mM NaPhosphate, 5 mM EDTA, 
pH 7.4) and a temperature of 25-30°C, or equivalent conditions, are suitable for 
allele-specific probe hybridizations. Equivalent conditions can be determined by 

25 varying one or more of the parameters given as an example, as known in the art, while 
maintaining a similar degree of identity or similarity between the target nucleotide 
sequence and the primer or probe used. 

The term "isolated" is used herein to indicate that the material in question 
exists in a physical milieu distinct from that in which it occurs in nature. For 

30 example, an isolated nucleic acid of the invention may be substantially isolated with 
respect to the complex cellular milieu in which it naturally occurs. In some instances, 
the isolated material will form part of a composition (for example, a crude extract 
containing other substances), buffer system or reagent mix. In other circumstance, the 
material may be purified to essential homogeneity, for example as determined by 

35 PAGE or column chromatography such as HPLC. Preferably, an isolated nucleic acid 
comprises at least about 50, 80 or 90 percent (on a molar basis) of all macromolecular 
species present. 
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I. Novel Polymorphisms of the Invention 

The novel polymorphisms of the invention are shown in the Table. 

II. Analysis of Polymorphisms 

A. Preparation of Samples 

5 Polymorphisms are detected in a target nucleic acid from an individual being 

analyzed. For assay of genomic DNA, virtually any biological sample (other than 
pure red blood cells) is suitable. For example, convenient tissue samples include 
whole blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. 
For assay of cDNA or mRNA, the tissue sample must be obtained from an organ in 

10 which the target nucleic acid is expressed. For example, if the target nucleic acid is a 
cytochrome P450, the liver is a suitable source. 

Many of the methods described below require amplification of DNA from 
target samples. This can be accomplished by e.g., PGR. See generally PCR 
Technology: Principles and Applications for DNA Amplification (ed. H.A. Erlich, 

15 Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and 

Applications (eds. Innis, et ai, Academic Press, San Diego, CA, 1990); Mattila et aL 9 
Nucleic Acids Res. 19, 4967 (1991); Eckert etai, PCR Methods and Applications 1, 
17 (1991); PCR (eds. McPherson et ai, IRL Press, Oxford); and U.S. Patent 
4,683,202. 

20 Other suitable amplification methods include the ligase chain reaction (LCR) 

(see Wu and Wallace, Genomics 4, 560 (1989), Landegren et a/., Science 241, 1077 

(1988) , transcription amplification (Kwoh et al y Proc. Natl Acad. Set USA 86, 1 173 

(1989) ), and self-sustained sequence replication (Guatelli et al., Proc. Nat Acad. Sci. 
USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA). The 

25 latter two amplification methods involve isothermal reactions based on isothermal 
transcription, which produce both single stranded RNA (ssRNA) and double stranded 
DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, 
respectively. 

B. Detection of Polymorphisms in Target DNA 

30 There are two distinct types of analysis of target DNA for detecting 

polymorphisms. The first type of analysis, sometimes referred to as de novo 
characterization, is carried out to identify polymorphic sites not previously 
characterized (i.e., to identify new polymorphisms). This analysis compares target 
sequences in different individuals to identify points of variation, i.e., polymorphic 

35 sites. By analyzing groups of individuals representing the greatest ethnic diversity 
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among humans and greatest breed and species variety in plants and animals, patterns 
characteristic of the most common alleles/haplotypes of the locus can be identified, 
and the frequencies of such alleles/haplotypes in the population can be determined. 
Additional allelic frequencies can be determined for subpopulations characterized by 
5 criteria such as geography, race, or gender. The de novo identification of 

polymorphisms of the invention is described in the Examples section. The second 
type of analysis determines which form(s) of a characterized (known) polymorphism 
are present in individuals under test. There are a variety of suitable procedures, which 
are discussed in turn. 

10 1. Allele-Specific Probes 

The design and use of allele-specific probes for analyzing polymorphisms is 
described by e.g., Saiki et aL, Nature 324, 163-166 (1986); Dattagupta, EP 235,726, 
Saiki, WO 89/1 1548. Allele-specific probes can be designed that hybridize to a 
segment of target DNA from one individual but do not hybridize to the corresponding 

15 segment from another individual due to the presence of different polymorphic forms 
in the respective segments from the two individuals. Hybridization conditions should 
be sufficiently stringent that there is a significant difference in hybridization intensity 
between alleles, and preferably an essentially binary response, whereby a probe 
hybridizes to only one of the alleles. Some probes are designed to hybridize to a 

20 segment of target DNA such that the polymorphic site aligns with a central position 
(e.g., in a 15-mer at the 7 position; in a 16-mer, at either the 8 or 9 position) of the 
probe. This design of probe achieves good discrimination in hybridization between 
different allelic forms. 

Allele-specific probes are often used in pairs, one member of a pair showing a 

25 perfect match to a reference foim of a target sequence and the other member showing 
a perfect match to a variant form. Several pairs of probes can then be immobilized on 
the same support for simultaneous analysis of multiple polymorphisms within the 
same target sequence. 

2. Tiling Arrays 

30 The polymorphisms can also be identified by hybridization to nucleic acid 

arrays, some examples of which are described in WO 95/1 1995. One form of such 
arrays is described in the Examples section in connection with de novo identification 
of polymorphisms. The same array or a different array can be used for analysis of 
characterized polymorphisms. WO 95/1 1995 also describes subarrays that are 

35 optimized for detection of a variant form of a precharacterized polymorphism. Such a 
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subarray contains probes designed to be complementary to a second reference 
sequence, which is an allelic variant of the first reference sequence. The second group 
of probes is designed by the same principles as described in the Examples, except that 
the probes exhibit complementarity to the second reference sequence. The inclusion 
5 of a second group (or further groups) can be particularly useful for analyzing short 
subsequences of the primary reference sequence in which multiple mutations are 
expected to occur within a short distance commensurate with the length of the probes 
(e.g., two or more mutations within 9 to 21 bases). 

3. Allele-Specific Primers 

1 0 An allele-specific primer hybridizes to a site on target DNA overlapping a 

polymorphism and only primes amplification of an allelic form to which the primer 
exhibits perfect complementarity. See Gibbs, Nucleic Acid Res. 17, 2427-2448 
(1989). This primer is used in conjunction with a second primer which hybridizes at a 
distal site. Amplification proceeds from the two primers, resulting in a detectable 

15 product which indicates the particular allelic form is present. A control is usually 
performed with a second pair of primers, one of which shows a single base mismatch 
at the polymorphic site and the other of which exhibits perfect complementarity to a 
distal site. The single-base mismatch prevents amplification and no detectable 
product is formed. The method works best when the mismatch is included in the 3- 

20 most position of the oligonucleotide aligned with the polymorphism because this 
position is most destabilizing to elongation from the primer (see, e.g., WO 93/22456). 

4. Direct-Sequencing 

The direct analysis of the sequence of polymorphisms of the present invention 
can be accomplished using either the dideoxy chain termination method or the Maxam 
25 Gilbert method (see Sambrook et a/., Molecular Cloning, A Laboratory Manual (2nd 
Ed., CSHP, New York 1989); Zyskind et al y Recombinant DNA Laboratory Manual, 
(Acad. Press, 1988)). 

5. Denaturing Gradient Gel Electrophoresis 

Amplification products generated using the polymerase chain reaction can be 
30 analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can 
be identified based on the different sequence-dependent melting properties and 
electrophoretic migration of DNA in solution. Erlich, ed, PCR Technology, 
Principles and Applications for DNA Amplification, (W.H. Freeman and Co, New 
York, 1992), Chapter 7. 
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6. Single-Strand Conformation Polymorphism Analysis 
Alleles of target sequences can be differentiated using single-strand 
conformation polymorphism analysis, which identifies base differences by alteration 
in electrophoretic migration of single stranded PCR products, as described in Orita et 
5 al, Proa Nat Acad. ScL 86, 2766-2770 (1989). Amplified PCR products can be 
generated as described above, and heated or otherwise denatured, to form single 
stranded amplification products. Single-stranded nucleic acids may refold or form 
secondary structures which are partially dependent on the base sequence. The 
different electrophoretic mobilities of single-stranded amplification products can be 
10 related to base-sequence differences between alleles of target sequences. 

m. Methods of Use 

After determining polymorphic form(s) present in an individual at one or more 
polymorphic sites, this information can be used in a number of methods. 

A. Forensics 

1 5 Determination of which polymorphic forms occupy a set of polymorphic sites 

in an individual identifies a set of polymorphic forms that distinguishes the individual. 
See generally National Research Council, The Evaluation of Forensic DNA Evidence 
(Eds. Pollard et a/., National Academy Press, DC, 1996). The more sites that are 
analyzed, the lower the probability that the set of polymorphic forms in one individual 

20 is the same as that in an unrelated individual. Preferably, if multiple sites are 

analyzed, the sites are unlinked. Thus, polymorphisms of the invention are often used 
in conjunction with polymorphisms in distal genes. Preferred polymorphisms for use 
in forensics are biallelic because the population frequencies of two polymorphic forms 
can usually be determined with greater accuracy than those of multiple polymorphic 

25 forms at multi-allelic loci. 

The capacity to identify a distinguishing or unique set of forensic markers in 
an individual is useful for forensic analysis. For example, one can determine whether 
a blood sample from a suspect matches a blood or other tissue sample from a crime 
scene by determining whether the set of polymorphic forms occupying selected 

30 polymorphic sites is the same in the suspect and the sample. If the set of polymorphic 
markers does not match between a suspect and a sample, it can be concluded (barring 
experimental error) that the suspect was not the source of the sample. If the set of 
markers does match, one can conclude that the DNA from the suspect is consistent 
with that found at the crime scene. If frequencies of the polymorphic forms at the loci 

35 tested have been determined (e.g., by analysis of a suitable population of individuals), 
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one can perform a statistical analysis to determine the probability that a match of 
suspect and crime scene sample would occur by chance. 

p(K>) is the probability that two random individuals have the same 
polymorphic or allelic form at a given polymorphic site. In biallelic loci, four 
5 genotypes are possible: AA, AB, B A, and BB. If alleles A and B occur in a haploid 
genome of the organism with frequencies x and y, the probability of each genotype in 
a diploid organism is (see WO 95/12607): 
Homozygote: p(AA)= x 2 
Homozygote: p(BB)= y 2 = (1-x) 2 
1 0 Single Heterozygote: p(AB)= p(BA)= xy = x( 1 -x) 

Both Heterozygotes: p(AB+BA)= 2xy = 2x(l-x) 

The probability of identity at one locus (i.e, the probability that two 
individuals, picked at random from a population will have identical polymorphic 
forms at a given locus) is given by the equation: 
15 p(ID) = (x 2 ) 2 + (2xy) 2 + (y 2 ) J . 

These calculations can be extended for any number of polymorphic forms at a 
given locus. For example, the probability of identity p(K>) for a 3-allele system 
where the alleles have the frequencies in the population of x, y and z, respectively, is 
equal to the sum of the squares of the genotype frequencies: 
20 p(ID) = x 4 + (2xy) 2 + (2yz) 2 + (2xz) 2 + z 4 + y 4 

In a locus of n alleles, the appropriate binomial expansion is used to calculate 
p(ID) and p(exc). 

The cumulative probability of identity (cum p(ID)) for each of multiple 
unlinked loci is determined by multiplying the probabilities provided by each locus. 
25 cum p(ID) =■ p(TOl)p(ID2)p(ID3).... p(IDn) 

The cumulative probability of non-identity for n loci (i.e. the probability that 
two random individuals will be different at 1 or more loci) is given by the equation: 

cum p(nonBD) = 1-cum p(ID). 

If several polymorphic loci are tested, the cumulative probability of non- 
30 identity for random individuals becomes very high (e.g., one billion to one). Such 
probabilities can be taken into account together with other evidence in determining 
the guilt or innocence of the suspect. 
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B. Paternity Testing 

The object of paternity testing is usually to determine whether a male is the 
father of a child. In most cases, the mother of the child is known and thus, the 
mother's contribution to the chiles genotype can be traced. Paternity testing 
5 investigates whether the part of the child's genotype not attributable to the mother is 
consistent with that of the putative father. Paternity testing can be performed by 
analyzing sets of polymorphisms in the putative father and the child. 

If the set of polymorphisms in the child attributable to the father does not 
match the set of polymorphisms of the putative father, it can be concluded, barring 
10 experimental error, that the putative father is not the real father. Ifthesetof 
polymorphisms in the child attributable to the father does match the set of 
polymorphisms of the putative father, a statistical calculation can be performed to 
determine the probability of coincidental match. 

The probability of parentage exclusion (representing the probability that a 
15 random male will have a polymorphic form at a given polymorphic site that makes 
him incompatible as the father) is given by the equation (see WO 95/12607): 

p(exc) = xy(l-xy) 

where x and y are the population frequencies of alleles A and B of a biallelic 
polymorphic site. 

20 (At a triallelic site p(exc) = xy(l-xy) + yz(l- yz) + xz(l-xz)+ 3xyz(l-xyz))), 

where x, y and z and the respective population frequencies of alleles A, B and C). 
The probability of non-exclusion is 
p(non-exc) = l-p(exc) 

The cumulative probability of non-exclusion (representing the value obtained 
25 when n loci are used) is thus: 

cum p(non-exc) = p(non-excl)p(non-exc2)p(non-exc3).... p(non-excn) 
The cumulative probability of exclusion for n loci (representing the probability 
that a random male will be excluded) 

cum p(exc) = 1 - cum p(non-exc). 
30 If several polymorphic loci are included in the analysis, the cumulative 

probability of exclusion of a random male is very high. This probability can be taken 
into account in assessing the liability of a putative father whose polymorphic marker 
set matches the child's polymorphic marker set attributable to his/her father. 

C. Correlation of Polymorphisms with Phenotypic Traits 

35 The polymorphisms of the invention may contribute to the phenotype of an 

organism in different ways. Some polymorphisms occur within a protein coding 
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sequence and contribute to phenotype by affecting protein structure. The effect may 
be neutral, beneficial or detrimental, or both beneficial and detrimental, depending on 
the circumstances. For example, a heterozygous sickle cell mutation confers 
resistance to malaria, but a homozygous sickle cell mutation is usually lethal. Other 
5 polymorphisms occur in noncoding regions but may exert phenotypic effects 
indirectly via influence on replication, transcription, and translation, A single 
polymorphism may affect more than one phenotypic trait. Likewise, a single 
phenotypic trait may be affected by polymorphisms in different genes. Further, some 
polymorphisms predispose an individual to a distinct mutation that is causally related 

1 0 to a certain phenotype. 

Phenotypic traits include diseases that have known but hitherto unmapped 
genetic components (e.g., agammaglobulimenia, diabetes insipidus, Lesch-Nyhan 
syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, Fabry's disease, familial 
hypercholesterolemia, polycystic kidney disease, hereditary spherocytosis, von 

1 5 Willebrand's disease, tuberous sclerosis, hereditary hemorrhagic telangiectasia, 
familial colonic polyposis, Ehlers-Danlos syndrome, osteogenesis imperfecta, and 
acute intermittent porphyria). Phenotypic traits also include symptoms of, or 
susceptibility to, multifactorial diseases of which a component is or may be genetic, 
such as autoimmune diseases, inflammation, cancer, diseases of the nervous system, 

20 and infection by pathogenic microorganisms. Some examples of autoimmune 

diseases include rheumatoid arthritis, multiple sclerosis, diabetes (insulin-dependent 
and non-independent), systemic lupus erythematosus and Graves disease. Some 
examples of cancers include cancers of the bladder, brain, breast, colon, esophagus, 
kidney, leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach and 

25 uterus. Phenotypic traits also include characteristics such as longevity, appearance 
(e.g., baldness, obesity), strength, speed, endurance, fertility, and susceptibility or 
receptivity to particular drugs or therapeutic treatments. 

The correlation of one or more polymorphisms with phenotypic traits can be 
facilitated by knowledge of the gene product of the wild type (reference) gene. The 

30 genes in which cSNPs of the present invention have been identified are genes which 
have been previously sequenced and characterized in one of their allelic forms. For 
example, genes of the present invention in which cSNPs have been identified include 
genes encoding antithrombin m (Humphries, Semin Hematol 32:8-16 (1995); 
Mammen, Semin Hematol 32:2-6 (1995)), cholesterol ester transfer protein (Bruce 

35 and Tall, Curr Opin Lipidol 6:306-31 1 (1995)), CLanalog (HDL/scavenger receptor) 
(Freeman, Curr Opin Hematol 4:41-47 (1997); Knecht and Glass, Adv Genet 32:141- 
198 (1995); Rigotti et a/., Curr Opin Lipidol 5:181-188 (1997)), thrombin receptor 
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(Brass and Molino, Thromb Haemost 75:234-241 (1997); Jamieson, Thromb Haemost 
75:242-246 (1997)), thrombin (Eisenberg, Coron Artery Dis 7:40(M08 (1996); 
Jamieson, Thromb Haemost 75:242-246 (1997)), and heparin cofactor II (Bick and 
Pegram, Semin Thromb Hemost 20:109-132 (1994)). Also included are the genes 
5 encoding HMG coA-reductase (Bjelajac et al, Ann Pharmacother 50:1304-1315 

(1996) ), platelet glycoprotein IIB and mA (Jamieson, Thromb Haemost 75:242-246 

(1997) ; Lefkovits et al, NEnglJMed 552:1553-1559 (1995); Nurden, Thromb 
Haemost 74:345-351 (1995)), lecithinxholesterol acyltransferase (Kuivenhoven et al, 
J Lipid Res 55:191-205 (1997)), LDL receptor (Holvoet and Collen, Curr Opin 

10 Lipidol 5:320-328 (1997); Rigotti et al, Curr Opin Lipidol 5:181-188 (1997)), 
protein C (Bertina, Clin Chem 45:1678-1683 (1997); Bick and Pegram, Semin 
Thromb Hemost 20:109-132 (1994); Humphries, Semin Hematol 52:8-16 (1995); 
Koeleman et al, Semin Hematol 54:256-264 (1997)), platelet activating factor 
receptor (Feuerstein et al, J Lipid Mediat Cell Signal 75:255-284 (1997); Shimizu 

1 5 and Mutoh, Adv Exp Med Biol 407: 1 97-204 (1 997)), tissue factor (Abildgaard, Blood 
Coagul Fibrinolysis 5:S45-49(1995); Bick and Pegram, Semin Thromb Hemost 
20:109-132 (1994); Harker et al., Haemostasis 7:76-82 (1996); Ruf and Edgington, 
Faseb 75:385-390 (1994)), tissue factor pathway inhibitor (Shimizu and Mutoh, Adv 
Exp Med Biol 407:197-204 (1997); Feuerstein et al, J Lipid Mediat Cell Signal 

20 i 5:255-284 (1997)), thromboxane A2 receptor (Feuerstein et al, J Lipid Mediat Cell 
Signal 75:255-284 (1997); Kinsella etal,Ann NYAcadSci 774:270-278 (1994); 
Patrono and Renda, Am J Cardiol 50.17E-20E (1997)), lipoprotein lipase 
(Applebaum-Bowden, Curr Opin Lipidol 6:130-135 (1995)), and factor V (Bertina, 
Clin Chem 45:1678-1683 (1997); Harker et al, Haemostasis 7:76-82 (1996); 

25 Koeleman et al, Semin Hematol 54:256-264 (1997)). 

Correlation is performed for a population of individuals who have been tested 
for the presence or absence of a phenotypic trait of interest and for polymorphic 
markers sets. To perform such analysis, the presence or absence of a set of 
polymorphisms (i.e. a polymorphic set) is determined for a set of the individuals, 

30 some of whom exhibit a particular trait, and some of which exhibit lack of the trait. 
The alleles of each polymorphism of the set are then reviewed to determine whether 
the presence or absence of a particular allele is associated with the trait of interest. 
Correlation can be performed by standard statistical methods such as a K-squared test 
and statistically significant correlations between polymorphic form(s) and phenotypic 

35 characteristics are noted. For example, it might be found that the presence of allele 
Al at polymorphism A correlates with heart disease. As a further example, it might 
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be found that the combined presence of allele Al at polymorphism A and allele Bl at 
polymorphism B correlates with increased milk production of a farm animal. 

Such correlations can be exploited in several ways. In the case of a strong 
correlation between a set of one or more polymorphic forms and a disease for which 
5 treatment is available, detection of the polymorphic form set in a human or animal 
patient may justify immediate administration of treatment, or at least the institution of 
regular monitoring of the patient. Detection of a polymorphic form correlated with 
serious disease in a couple contemplating a family may also be valuable to the couple 
in their reproductive decisions. For example, the female partner might elect to 

10 undergo in vitro fertilization to avoid the possibility of transmitting such a 

polymorphism from her husband to her offspring. In the case of a weaker, but still 
statistically significant correlation between a polymorphic set and human disease, 
immediate therapeutic intervention or monitoring may not be justified. Nevertheless, 
the patient can be motivated to begin simple life-style changes (e.g., diet, exercise) 

1 5 that can be accomplished at little cost to the patient but confer potential benefits in 
reducing the risk of conditions to which the patient may have increased susceptibility 
by virtue of variant alleles. Identification of a polymorphic set in a patient correlated 
with enhanced receptiveness to one of several treatment regimes for a disease 
indicates that this treatment regime should be followed. 

20 For animals and plants, correlations between characteristics and phenotype are 

useful for breeding for desired characteristics. For example, Beitz et al, US 
5,292,639 discuss use of bovine mitochondrial polymorphisms in a breeding program 
to improve milk production in cows. To evaluate the effect of mtDNA D-loop 
sequence polymorphism on milk production, each cow was assigned a value of 1 if 

25 variant or 0 if wildtype with respect to a prototypical mitochondrial DNA sequence at 
each of 17 locations considered. Each production trait was analyzed individually with 
the following animal model: 

Y ijkpn =Ji + YS i + P j + X k + p i + ...p 17 + PE n + a n +e p 
where Y ijtap is the milk, fat, fat percentage, SNF, SNF percentage, energy 

30 concentration, or lactation energy record; ^ is an overall mean; YSj is the effect 

common to all cows calving in year-season; X k is the effect common to cows in either 
the high or average selection line; p, to P, 7 are the binomial regressions of production 
record on mtDNA D-loop sequence polymorphisms; PE n is permanent environmental 
effect common to all records of cow n; a,, is effect of animal n and is composed of the 

35 additive genetic contribution of sire and dam breeding values and a Mendelian 
sampling effect; and e p is a random residual. It was found that eleven of seventeen 
polymorphisms tested influenced at least one production trait. Bovines having the 



WO 99/50454 



-19- 



PCT/US99/06473 



best polymorphic forms for milk production at these eleven loci are used as parents 
for breeding the next generation of the herd. 

D. Genetic Mapping ofPhenotypic Traits 

The previous section concerns identifying correlations between phenotypic 
5 traits and polymorphisms that directly or indirectly contribute to those traits. The 
present section describes identification of a physical linkage between a genetic locus 
associated with a trait of interest and polymorphic markers that are not associated with 
the trait, but are in physical proximity with the genetic locus responsible for the trait 
and co-segregate with it. Such analysis is useful for mapping a genetic locus 

10 associated with a phenotypic trait to a chromosomal position, and thereby cloning 
gene(s) responsible for the trait. See Lander et aL, Proc. Natl. Acad. ScL (USA) 83, 
7353-7357 (1986); Lander et aL, Proc. Natl Acad, ScL (USA) 84, 2363-2367 (1987); 
Donis-Keller et aL, Cell 51, 319-337 (1987); Lander et aL, Genetics 121, 185-199 
(1989)). Genes localized by linkage can be cloned by a process known as directional 

15 cloning. See Wainwright, Med. J. Australia 159, 170-174 (1993); Collins, Nature 
Genetics 1,3-6(1992). 

Linkage studies are typically performed on members of a family. Available 
members of the family are characterized for the presence or absence of a phenotypic 
trait and for a set of polymorphic markers. The distribution of polymorphic markers 

20 in an informative meiosis is then analyzed to determine which polymorphic markers 
co-segregate with a phenotypic trait. See, e.g., Kerem et aL, Science 245, 1073-1080 
(1989); Monaco et aL, Nature 316, 842 (1985); Yamoka et aL, Neurology 40, 222- 
226 (1990); Rossiter et aL, FASEB Journal 5, 21-27 (1991). 

Linkage is analyzed by calculation of LOD (log of the odds) values. A lod 

25 value is the relative likelihood of obtaining observed segregation data for a marker 
and a genetic locus when the two are located at a recombination fraction 0, versus the 
situation in which the two are not linked, and thus segregating independently 
(Thompson & Thompson, Genetics in Medicine (5th ed, W.B. Saunders Company, 
Philadelphia, 1991); Strachan, "Mapping the human genome" in The Human Genome 

30 (BIOS Scientific Publishers Ltd, Oxford), Chapter 4). A series of likelihood ratios are 
calculated at various recombination fractions (0), ranging from 0 = 0.0 (coincident 
loci) to 0 = 0.50 (unlinked). Thus, the likelihood at a given value of 0 is: probability 
of data if loci linked at 0 to probability of data if loci unlinked. The computed 
likelihoods are usually expressed as the log I0 of this ratio (i.e., a lod score). For 

35 example, a lod score of 3 indicates 1000:1 odds against an apparent observed linkage 
being a coincidence. The use of logarithms allows data collected from different 
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families to be combined by simple addition. Computer programs are available for the 
calculation of lod scores for differing values of 0 (e.g., LIPED, MLINK (Lathrop, 
Proa Nat. Acad. ScL (USA) 81, 3443-3446 (1984)). For any particular lod score, a 
recombination fraction may be determined from mathematical tables. See Smith et 
5 al y Mathematical tables for research workers in human genetics (Churchill, London, 
1961); Smith, Ann. Hum. Genet. 32, 127-150 (1968). The value of 0 at which the lod 
score is the highest is considered to be the best estimate of the recombination fraction. 

Positive lod score values suggest that the two loci are linked, whereas negative 
values suggest that linkage is less likely (at that value of 0) than the possibility that 

10 the two loci are unlinked. By convention, a combined lod score of +3 or greater 
(equivalent to greater than 1000: 1 odds in favor of linkage) is considered definitive 
evidence that two loci are linked. Similarly, by convention, a negative lod score of -2 
or less is taken as definitive evidence against linkage of the two loci being compared. 
Negative linkage data are useful in excluding a chromosome or a segment thereof 

15 from consideration. The search focuses on the remaining non-excluded chromosomal 
locations. 

IV. Modified Polypeptides and Gene Sequences 

The invention further provides variant forms of nucleic acids and 
corresponding proteins. The nucleic acids comprise one of the sequences described in 

20 the Table, column 8, in which the polymorphic position is occupied by one of the 
alternative bases for that position. Some nucleic acids encode full-length variant 
forms of proteins. Similarly, variant proteins have the prototypical amino acid 
sequences encoded by nucleic acid sequences shown in the Table, column 8, (read so 
as to be in-frame with the full-length coding sequence of which it is a component) 

25 except at an amino acid encoded by a codon including one of the polymorphic 
positions shown in the Table. That position is occupied by the amino acid coded by 
the corresponding codon in any of the alternative forms shown in the Table. 

Variant genes can be expressed in an expression vector in which a variant gene 
is operably linked to a native or other promoter. Usually, the promoter is a eukaryotic 

30 promoter for expression in a mammalian cell. The transcription regulation sequences 
typically include a heterologous promoter and optionally an enhancer which is 
recognized by the host. The selection of an appropriate promoter, for example trp, 
lac, phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on 
the host selected. Commercially available expression vectors can be used. Vectors 

35 can include host-recognized replication systems, amplifiable genes, selectable 
markers, host sequences useful for insertion into the host genome, and the like. 
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The means of introducing the expression construct into a host cell varies 
depending upon the particular construction and the target host. Suitable means 
include fusion, conjugation, transfection, transduction, electroporation or injection, as 
described in Sambrook, supra. A wide variety of host cells can be employed for 
5 expression of the variant gene, both prokaryotic and eukaryotic. Suitable host cells 
include bacteria such as E. coli, yeast, filamentous fungi, insect cells, mammalian 
cells, typically immortalized, e.g., mouse, CHO, human and monkey cell lines and 
derivatives thereof. Preferred host cells are able to process the variant gene product to 
produce an appropriate mature polypeptide. Processing includes glycosylation, 
10 ubiquitination, disulfide bond formation, general post-translational modification, and 
the like. As used herein, "gene product" includes mRNA, peptide and protein 
products. 

The protein may be isolated by conventional means of protein biochemistry 
and purification to obtain a substantially pure product, i.e., 80, 95 or 99% free of cell 

15 component contaminants, as described in Jacoby, Methods in Enzymology Volume 
104, Academic Press, New York (1984); Scopes, Protein Purification, Principles and 
Practice, 2nd Edition, Springer-Verlag, New York (1987); and Deutscher (ed), Guide 
to Protein Purification, Methods in Enzymology, Vol. 182 (1990). If the protein is 
secreted, it can be isolated from the supernatant in which the host cell is grown. If not 

20 secreted, the protein can be isolated from a lysate of the host cells. 

The invention further provides transgenic nonhuman animals capable of 
expressing an exogenous variant gene and/or having one or both alleles of an 
endogenous variant gene inactivated. Expression of an exogenous variant gene is 
usually achieved by operably linking the gene to a promoter and optionally an 

25 enhancer, and microinjecting the construct into a zygote. See Hogan et a/., 
"Manipulating the Mouse Embryo, A Laboratory Manual," Cold Spring Harbor 
Laboratory. Inactivation of endogenous variant genes can be achieved by forming a 
transgene in which a cloned variant gene is inactivated by insertion of a positive 
selection marker. See Capecchi, Science 244, 1288-1292 (1989). The transgene is 

30 then introduced into an embryonic stem cell, where it undergoes homologous 

recombination with an endogenous variant gene. Mice and other rodents are preferred 
animals. Such animals provide useful drug screening systems. 

In addition to substantially full-length polypeptides expressed by variant 
genes, the present invention includes biologically active fragments of the 

35 polypeptides, or analogs thereof, including organic molecules which simulate the 
interactions of the peptides. Biologically active fragments include any portion of the 
full-length polypeptide which confers a biological function on the variant gene 
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product, including ligand binding, and antibody binding. Ligand binding includes 
binding by nucleic acids, proteins or polypeptides, small biologically active 
molecules, or large cellular structures. 

Polyclonal and/or monoclonal antibodies that specifically bind to variant gene 
5 products but not to corresponding prototypical gene products are also provided. 
Antibodies can be made by injecting mice or other animals with the variant gene 
product or synthetic peptide fragments thereof. Monoclonal antibodies are screened 
as are described, for example, in Harlow & Lane, Antibodies, A Laboratory Manual, 
Cold Spring Harbor Press, New York (1988); Goding, Monoclonal antibodies, 
10 Principles and Practice (2d ed.) Academic Press, New York (1986). Monoclonal 
antibodies are tested for specific immunoreactivity with a variant gene product and 
lack of immunoreactivity to the corresponding prototypical gene product. These 
antibodies are useful in diagnostic assays for detection of the variant form, or as an 
active ingredient in a pharmaceutical composition. 

15 V. Kits 

The invention further provides kits comprising at least one allele-specific 
oligonucleotide as described above. Often, the kits contain one or more pairs of 
allele-specific oligonucleotides hybridizing to different forms of a polymorphism. In 
some kits, the allele-specific oligonucleotides are provided immobilized to a substrate. 

20 For example, the same substrate can comprise allele-specific oligonucleotide probes 
for detecting at least 10, 100 or all of the polymorphisms shown in the Table. 
Optional additional components of the kit include, for example, restriction enzymes, 
reverse-transcriptase or polymerase, the substrate nucleoside triphosphates, means 
used to label (for example, an avidin-enzyme conjugate and enzyme substrate and 

25 chromogen if the label is biotin), and the appropriate buffers for reverse transcription, 
PCR, or hybridization reactions. Usually, the kit also contains instructions for 
carrying out the methods. 

The following Examples are offered for the purpose of illustrating the present 
invention and are not to be construed to limit the scope of this invention. The 

30 teachings of all references cited herein are hereby incorporated herein by reference, 

EXAMPLES 

The polymorphisms shown in the Table were identified by resequencing of 
target sequences from a minimum of 50 unrelated individuals of diverse ethnic and 
geographic backgrounds by hybridization to probes immobilized to microfabricated 
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arrays. The strategy and principles for design and use of such arrays are generally 
described in WO 95/11995. 

A typical probe array used in this analysis has two groups of four sets of 
probes that respectively tile both strands of a reference sequence. A first probe set 
5 comprises a plurality of probes exhibiting perfect complementarily with one of the 
reference sequences. Each probe in the first probe set has an interrogation position 
that corresponds to a nucleotide in the reference sequence. That is, the interrogation 
position is aligned with the corresponding nucleotide in the reference sequence, when 
the probe and reference sequence are aligned to maximize complementarily between 

10 the two. For each probe in the first set, there are three corresponding probes from 
three additional probe sets. Thus, there are four probes corresponding to each 
nucleotide in the reference sequence. The probes from the three additional probe sets 
are identical to the corresponding probe from the first probe set except at the 
interrogation position, which occurs in the same position in each of the four 

1 5 corresponding probes from the four probe sets, and is occupied by a different 

nucleotide in the four probe sets. In the present analysis, probes were 25 nucleotides 
long. Arrays tiled for multiple different references sequences were included on the 
same substrate. 

Publicly available sequences for a given gene were assembled into Gap4 
20 (http://ww.biozentmm.imibas.ch/~biocomp/staden/Overview.html). PCR primers 
covering each exon were designed using Primer 3 (http://www- 
genome.wi.mit.edu/cgi-bin/primer/primer3.cgi). Primers were not designed in regions 
where there were sequence discrepancies between reads. For CLA1 , whose genomic 
sequence is not published, nested primers were designed from the cDNA. For all 
25 genes except CLA1, genomic DNA was amplified in at least 50 individuals using 2.5 
pmol each primer, 1 .5 mM MgCl 2 , 1 00 dNTPs, 0.75 AmpliTaq GOLD 
polymerase, and 19 ng DNA in a 15 jil reaction. Reactions were assembled using a 
PACKARD MultiPROBE robotic pipetting station and then put in MJ 96-well tetrad 
thermocyclers (96°C for 10 minutes, followed by 35 cycles of 96°C for 30 seconds, 
30 59°C for 2 minutes, and 72°C for 2 minutes). A subset of the PCR assays for each 
individual were run on 3% NuSieve gels in 0.5X TBE to confirm that the reaction 
worked. 

For CLA1, first strand cDNA was made using the Gibco BRL Superscript 
Preamplification Kit (#18089-01 1) and following the manufacturers instructions 
35 except that 150 ng of random hexamers were used to primer 1 \ig of total RNA. The 
cDNA was amplified using the outermost primer pairs and the above conditions; 1/20 
of the reaction was used as a template for the secondary PCR using the innermost 
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primers. All RT-PCR products were run on 2% NuSieve gels in IX TAE to confinn 
the presence of a product. 

For a given DNA, 5 ul (about 50ng) of each PCR or RT-PCR product were 
pooled (Final volume = 150-200 ul). The products were purified using QiaQuick 
5 PCR purification from Qiagen. The samples were eluted once in 35 ul sterile water 
and 4 ul 10X One-Phor-All buffer (Pharmacia). The pooled samples were digested 
with 0.2 u DNasel (Promega)for 10 minutes at 37 °C and then labeled with 0.5 nmols 
biotin-N6-ddATP and 15 u Terminal Transferase (GibcoBRL Life Technology) for 60 
minutes at 37 °C. Both fragmentation and labeling reactions were terminated by 

10 incubating the pooled sample for 15 minutes at 100°C. 

Low-density DNA chips (Affymetrix,CA) were hybridized following the 
manufacturer's instructions. Briefly, the hybridization cocktail consisted of 3M 
TMAC1, 10 mM Tris pH 7.8, 0.01% Triton X-100, 100 mg/ml herring sperm DNA 
(Gibco BRL), 200 pM control biotin-labeled oligo. The processed PCR products 

1 5 were denatured for 7 minutes at 1 00°C and then added to prewarmed (37°C) 
hybridization solution. The chips were hybridized overnight at 44 °C. Chips were 
washed in IX SSPET and 6X SSPET followed by staining with 2 ug/ml SARPE and 
0.5 mg/ml acetylated BSA in 200 ul of 6X SSPET for 8 minutes at room temperature. 
Chips were scanned using a Molecular Dynamics scanner. 

20 Chip image files were analyzed using Ulysses (Affymetrix, CA) which uses 

four algorithms to identify potential polymorphisms. Candidate polymorphisms were 
visually inspected and assigned a confidence value: high confidence candidates 
displayed all three genotypes, while likely candidates showed only two genotypes 
(homozygous for reference sequence and heterozygous for reference and variant). 

25 Some of the candidate polymoprhisms were confirmed by ABI sequencing. Identified 
polymorphisms were compared to SwissProt and the Mutation Database to determine 
if they were novel. Results are shown in the Table. 

In the Table, the genes listed in column 2 are as follows: antithrombin HI 
(AT3); cholesterol ester transfer protein (CETP); CLanalog (HDL/scavenger receptor) 

30 (CLanalog); thrombin receptor (F2R); thrombin (F2); heparin Cofactor II (HCF2); 
HMG coA-reductase (HMGCR); platelet glycoprotein HB (ITGA2B); platelet 
glycoprotein mA (ITGB3); lecithinxholesterol acyltransferase (LCAT); LDL 
receptor (LDLR); protein C (PROC); platelet activating factor receptor (PTAFR); 
tissue factor pathway inhibitor (TFPI); thromboxane A2 receptor (TBXA2R); 

35 lipoprotein lipase (LPL); tissue factor (F3); and factor V (F5). 

Column 1 of the Table shows the laboratory name for the particular gene. 
Column 3 shows the GenBank Accession number for the wild type (reference) allele. 
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Column 4 shows the nucleotide number location of the polymorphism relative to the 
numbering of the sequence deposited with GenBank having the listed Accession 
number; the GenBank sequence is understood to be the nucleotide sequence present in 
the GenBank database on April 1, 1998, which sequences are incorporated herein by 
5 reference in their entirety. These GenBank sequences are illustrated in Figures 3-38. 
Column 5 shows the codon which is altered by the polymorphism. Columns 
6, 7 and 8 show the reference codon, variant codon and amino acid change, 
respectively, for the silent polymorphisms. Columns 9, 10 and 1 1 show the reference 
codon, variant codon and amino acid change, respectively, for the missense 

10 polymorphisms. Columns 12, 13 and 14 show the reference codon, variant codon and 
amino acid change, respectively, for the nonsense polymorphisms. Columns 15 and 
16 show the nucleotide of the reference allele and the frequency of that allele, 
respectively. This base is arbitrarily designated the reference or prototypical form, 
but it is not necessarily the most frequently occurring form. Columns 17 and 18 show 

15 the nucleotide of the variant allele and the frequency of that allele, respectively. It is 
noted that the genes with polymorphism IDs of F5u8, HCF2ul and HMGCRu2 
contained the indicated polymorphism at the indicated nucleotide position, but that 
these nucleotide positions are in the non-coding region of the gene. 
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Genotyping and genetic association studies were performed with respect to the 
allelic forms of the F5U4 and HCF2U4 genes, and the presence of the reference and 
variant alleles (as shown in Table 1) were correlated with the occurrence of venous 
thrombosis and pulmonary emboli. The results are shown in Tables 2 and 3. 

5 TABLE 2: HCF2U4 GENETIC ASSOCIATION STUDY 





Case 


Control 


Reference 


115 


115 


Heterozygote 


5 


0 



(p = 0.027 by Chi-square test) 

(p = 0.06 by Fisher's exact test (two-tailed)). 

10 The F5u4 variant leads to an amino acid substitution (Met413Thr) in the 

coagulation factor V gene. Another common variant in Factor V (Arg506Gln), the 
Leiden Variant, is the most common genetic factor predisposing to thrombosis that 
has been identified to date. Genotyping of patients with deep venous thrombosis has 
confirmed a statistical association of this variant with deep venous 

15 thrombosis/pulmonary embolism in two separate populations of patients, as shown 
below: 



TABLE 3: F5U4 GENETIC ASSOCIATION STUDY 





REF 


HET 


VAR 


TOTAL 


AT.T.RI,F.FRF,Q 


REF 


VAR 


Case 


226 


38 


5 


269 


91% 


9% 


Control 


207 


28 


0 


235 


94% 


6% 


2nd Population 


Case 


85 


28 


2 


115 


86% 


14% 


Control 


95 


14 


4 


113 


90% 


10% 



(p <0.05 by Chi-square test for combined populations) 
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These data indicate that there is a trend toward an association between the 
presence of the variant allele (or heterozygousity) and the occurence of venous 
thrombosis and/or pulmonary emboli. 

From the foregoing, it is apparent that the invention includes a number of 
5 general uses that can be expressed concisely as follows. The invention provides for 
the use of any of the nucleic acid segments described above in the diagnosis or 
monitoring of diseases, such as cancer, inflammation, heart disease, diseases of the 
cardiovascular system, and infection by microorganisms. The invention further 
provides for the use of any of the nucleic acid segments in the manufacture of a 
10 medicament for the treatment or prophylaxis of such diseases. The invention further 
provides for the use of any of the DNA segments as a pharmaceutical. 

All references cited above are incorporated by reference in their entirety for all 
purposes to the same extent as if each individual publication or patent application 
were specifically and individually indicated to be so incorporated by reference. 

1 5 While this invention has been particularly shown and described with 

references to preferred embodiments thereof, it will be understood by those skilled 
in the art that various changes in form and details may be made therein without 
departing from the spirit and scope of the invention as defined by the appended 
claims. 
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CLAIMS 

WE CLAIM: 

1. A nucleic acid molecule selected from the group consisting of the genes listed 
in the Table, wherein said nucleic acid molecule is at least 5 nucleotides in 

5 length and comprises a polymorphic site identified in the Table, wherein a 

nucleotide at the polymorphic site is different from a nucleotide at the 
polymorphic site in a corresponding reference allele. 

2. A nucleic acid molecule according to Claim 1 , wherein said nucleic acid 
molecule is at least 10 nucleotides in length. 

10 3. A nucleic acid molecule according to Claim 1 , wherein said nucleic acid 
molecule is at least 20 nucleotides in length. 

4. A nucleic acid molecule according to Claim 1 , wherein the nucleotide at the 
polymorphic site is the variant nucleotide for the gene listed in the Table. 

5. An allele-specific oligonucleotide that hybridizes to a portion of a gene 

15 selected from the group consisting of the genes listed in the Table, wherein 

said portion is at least 5 nucleotides in length and comprises a polymorphic 
site identified in the Table, wherein a nucleotide at the polymorphic site is 
different from a nucleotide at the polymorphic site in a corresponding 
reference allele. 

20 6. An allele-specific oligonucleotide according to Claim 5 that is a probe. 

7. An allele-specific oligonucleotide according to Claim 5, wherein a central 
position of the probe aligns with the polymorphic site of the portion. 

8. An allele-specific oligonucleotide according to Claim 5 that is a primer. 

9. An allele-specific oligonucleotide according to Claim 8, wherein the 3' end of 
25 the primer aligns with the polymorphic site of the portion. 
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10. An isolated gene product encoded by a nucleic acid molecule according to 
Claim 1. 

11. A method of analyzing a nucleic acid sample, comprising obtaining the 
nucleic acid from an individual sample; and determining a base occupying any 

5 one of the polymorphic sites shown in the Table. 

12. A method according to Claim 11, wherein the nucleic acid sample is obtained 
from a plurality of individuals, and a base occupying one of the polymorphic 
positions is determined in each of the individuals, and the method further 
comprising testing each individual for the presence of a disease phenotype, 

1 0 and correlating the presence of the disease phenotype with the base. 
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AT3ul 


D29832:1005 


AT3u2 


D29832:1035 


AT3u3 


M21645:100 


AT3u4 


D29832:1374 


CETPul 


M30185:1298 


CETPu2 


M30185:1394 


CETPu3 


M30185:991 


CETPu4 


M30185:196 


CETPu5 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 

SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



HUMPAFRE 1780 bp mRNA PRI 10-OCT-1992 

Human mRNA for platelet-activating factor receptor, complete cds. 
D10202 D90433 
g219975 

G-protein coupled receptor; PAF receptor; platelet-activating 
factor receptor. 
Human leukocytes cDNA to mRNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; 
Homo. 

1 (bases 1 to 1780) 

Nakamura,M., Honda, Z. , Izumi,T., Sakanaka,C, Mutoh,H., MInami,M., 
Bito,H., Seyama,Y., Noma,M., Mtsumoto,T. and Shimizu, T. 
Molecular cloning and expression of platelet-activating factor 
receptor from human leukocytes 
J. Biol. Chem. 266 (30), 20400-20405 (1991) 
92041873 

2 (bases 1 to 1780) 
Shimizu,T. 
Direct Submission 

Submitted (28-JUN-1991) to the DDBJ/EMBL/GenBank databases. Takao 
Shimizu, Faculty of Medicine, University of Tokyo, DeDartment of 
Biochemistry; 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan 
(Tel:03-3812-2111(ex.3448), Fax:03-3813-8732) 
Submitted (28-Jun-1991 ) to DDBJ by: 
Takao Shimizu 

Department of Biochemistry 
Faculty of Medicine, University of Tokyo 
7-3-1 Hongo, Bunkyo-ku 
Tokyo 113 
Japan 

Phone: 03-3812-2111 x3448 
Fax: 03-3813-8732. 

Location/ Qua 1 i f i er s 
1. .1780 

/organism="Homo sapiens" 
/db_xref =■ taxon : 9606 ■ 
/ eel l_type =" leukocytes ■ 
113. .1141 
/codon_start=l 

/product= "platelet-activating factor receptor" 
/db_xref = ■ PID: dl001519 " 
/db_xref =" PID: g219976 ■ 

/ trans 1 a t i on= ■ ME PHDS SHMDS EFRYTLF P IVYS 1 1 FVLGVI ANGYVLWVF ARLY 
PCKKFNEIKIFMVNLTMADMLFLITLPLWIVTYQNQGNWILPKFLCNVAGCLFFINT^ 
C SVAFLGVITYNRFQAVTRP I KTAQANTRKRGI SLSLVIWVAI VGAAS YFLI LDSTNT 
VPDSAGSGNVTRCFEHYEKGSVPVLIIHIFIVFSFFLVFLIILFCNLVIIRTLLMQPV 
QQQRNAEVKRRALWMVCTVLAVF I ICFVPHHWQLPWTLAELGFQDSKFHQAINDAHQ 

VTIXLLSTNCVLDPVIYCFLTKKFRKHLTEKFYSMRSSRKCSRATTDTVT 

IPGNSLKN" 



FEATURES 

source 



CDS 



FIG. 3A 



SUBSTITUTE SHEET (RULE 26) 



WO 99/50454 



PCT/US99/06473 



BASE COUNT 393 a 533 c 

ORIGIN 

1 ttcacgaggg ctggggccag 
61 tgcccctgct acaggcacca 
121 acatgactcc tcccacatgg 
181 catcatcttt gtgctcgggg 
241 gtacccttgc aagaaattca 
301 catgctcttc ttgatcaccc 
361 gatactcccc aaattcctgt 
421 ctctgtggcc ttcctgggcg 
481 caagactgct caggccaaca 
541 ggccattgtg ggagctgcat 
601 cagtgctggc tcaggcaacg 
661 agtcctcatc atccacatct 
721 cttctgcaac ctggtcatca 
781 cgctgaagtc aagcgccggg 
841 ctgcttcgtg ccccaccacg 
901 ggacagcaaa ttccaccagg 
961 caccaactgt gtcttagacc 
1021 cctcaccgaa aagttctaca 
1081 tacggccact gaagtggttg 
1141 gtccctgctt ccaggcctga 
1201 agaagggata tctactgtgg 
1261 ttggaggcta cctcacctgg 
1321 ctcaaatgag ccccttcatc 
1381 ttatcctgag tcccttaatc 
1441 tgggggaaga ctttaaacca 
1501 gagtggcccc agtggctcac 
1561 tcacgggtca agagatcgag 
1621 catacaaaaa ttagccgggc 
1681 aggcaggaga atcgcttgaa 
1741 tgcactctag cctggcaaca 



8/97 
438 g 416 t 

gacccagaca gagacacacg 
ccaggaccag ctgatcattc 
actctgagtt ccgatacact 
tcattgctaa tggctacgtg 
atgagataaa gatcttcatg 
tgccactttg gattgtctac 
gcaacgtggc tggctgcctt 
tcatcactta taaccgcttc 
cccgcaagcg tggcatctct 
cctacttcct catcctggac 
tcactcgctg ctttgagcat 
tcatcgtgtt cagcttcttc 
tccgcacctt gctcatgcag 
cgctgtggat ggtgtgcacg 
tggtgcagct gccctggacc 
ccattaatga tgcacatcag 
ctgttatcta ctgtttcctc 
gcatgcgcag tagccggaaa 
tgccattcaa ccagatccct 
agtcttctcc tccatgaaac 
gtctgggcac cacctctgtg 
gcagggatga tgcagagcca 
cgcctgtggg cgcatactac 
ttatggggcc ggaaggaatg 
cctagttctc ccactggggc 
acctgtaatc ccagcacttt 
acatcctggc caacattgta 
atggtgcaca cgcctgtagt 
cctgggaggc agaggttgca 
gaggcagatt ccctcctgcc 



gtcactgcag ctgaagccgc 
cagcccacag caatggagcc 
ctcttcccga ttgtttacag 
ctgtgggtct ttgcccgcct 
gtgaacctca ccatggcgga 
taccaaaacc agggcaactg 
ttcttcatca acacctactg 
caggcagtaa ctcggcccat 
ttgtccttgg tcatctgggt 
tccaccaaca cagtgcccga 
tacgagaagg gcagcgtgcc 
ctggtcttcc tcatcatcct 
ccggtgcagc agcagcgcaa 
gtcttggcgg tgt tea teat 
ettgetgage tgggcttcca 
gtcaccctct gcctccttag 
accaagaagt tccgcaagca 
tgctcccggg ccaccacgga 
ggcaattccc tcaaaaatta 
atcatgactg agctggggga 
gcactggtgg gecattagat 
ggctgttgga aaatccagaa 
agtaactgtg actgatgact 
teagggecag gtgeagaect 
ateggtctaa agctttgggg 
gggaggcega ggtgggcaga 
aaaccccatc tctactaaaa 
cccagctact caggaggctg 
gtgaacctag attgeaccat 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 
SOURCE 

ORGANISM 



HUMATIIIV 1467 bp mRNA PRI 03-SEP-1996 

Human mRNA for anti thrombin III variant, complete cds. 
D29832 
g576553 

AT-III; antithrombin III. 

Homo sapiens (individual-isolate AT-III Kyoto) cDNA to mRNA, clone 
pKF16c. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; 
Homo. 

1 (sites) 

Tsuji,H., Takada,0., Nakagawa , M. , Tanaka,S. and Hashimoto-Gotoh.T. 
Hereditary antithrombin III deficiency: identification of an 
arginine-406 to methionine point mutation near protease reactive 
site 

(in) Yoshida,T.O. and Wilson, J. M. (Eds.); 

MOLECULAR APPROACHES TO THE STUDY AND TREATMENT OF HUMAN DISEASES: 
51-55; 

Elsevier Science (1992) 

2 (bases 1 to 1467) 
Hashimoto-Gotoh , T . 
Unpublished (1994) 

Location/Qualifiers 
1. .1467 

/organism= -Homo sapiens" 
/db_xref = " taxon : 9606 " 
22. .1419 

/note="Wild type AT-III has 'g* instead of 't' at 

1337 nt. Also amino acid residue changes from Met to Arg 
at position 406 aa in wild type AT-III." 
/codon_start=l 

/product= "antithrombin III (AT-III) variant" 

/db_xref="PID:dl006776" 

/db_xref="PID:g576554" 

/ translations "MYSNVIGTVTSGKRKVYLLSLLLIGFWDCVTCHGSPVDICTAKP 

RDI PMNPMCI YRSPEKKATEDEGSEQKI PEATNNRRVWELSKANSRFATTFYQHLADS 

KNDNDNI FLS PLS I STAFAMTKLGACNDTLQQLMEVFKFDT I S EKTSDQ IHFFFAKLN 

CRLYRKANKSSKLVSANRLFGDKSLTFNETYQDISELVYGAKLQPLDFKENAEQSRAA 

INKWVSNKTEGRITDVIPSEJVINELTVTA^VOT 

GESCSASMMYQEGKFRYRRVAEGTQVLELPFKGDDITMVLILPKPEKSLAKVEKELTP 

EVLQEWLDELEEMMLVVHMPRFRIEDGFSLKEQLQDMGLVDLFSPEKSKLPGIVAEGR 

DDLWSDAFHKAFLEVNEEGSEAAASTAWIAGRSI^PNKVTFKANMPFLVFIREVPL 
NTH FMGRVANPCVK " 
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AUTHORS 
TITLE 



JOURNAL 
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source 
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BASE COUNT 381 a 375 c 364 g 347 t 
ORIGIN 

1 gaattcgagc tcgccccggc catgtattcc aatgtgatag gaactgtaac ctctggaaaa 

61 aggaaggttt atctcttgtc cttgctgctc attggcttct gggactgcgt gacctgtcac 

121 gggagccccg tggacatctg cacagccaag ccgcgggaca ttcccatgaa ccccatgtgc 

181 atttaccgct ccccggagaa gaaggcaact gaggatgagg gctcagaaca gaagatcccg 

241 gaggccacca acaaccggcg tgtctgggaa ctgtccaagg ccaattcccg ctttgctacc 

301 actttctatc agcacctggc agattccaag aatgacaatg ataacatttt cctgtcaccc 

361 ctgagtatct ctacggcttt tgctatgacc aagctgggtg cctgtaatga caccctccag 

421 caactgatgg aggtatttaa gtttgacacc atatctgaga aaacatctga tcagatccac 

481 ttcttctttg ccaaactgaa ctgccgactc tatcgaaaag ccaacaaatc ctccaagtta 

541 gtatcagcca atcgcctttt tggagacaaa tcccttacct tcaatgagac ctaccaggac 

601 atcagtgagt tggtatatgg agccaagctc cagcccctgg acttcaagga aaatgcagag 

661 caatccagag cggccatcaa caaatgggtg tccaataaga ccgaaggccg aatcaccgat 

721 gtcattccct cggaagccat caatgagctc actgttctgg tgctggttaa caccatttac 

781 ttcaagggcc tgtggaagtc aaagttcagc cctgagaaca caaggaagga actgttctac 

841 aaggctgatg gagagtcgtg ttcagcatct atgatgtacc aggaaggcaa gttccgttat 

901 cggcgcgtgg ctgaaggcac ccaggtgctt gagttgccct tcaaaggtga tgacatcacc 

961 atggtcctca tcttgcccaa gcctgagaag agcctggcca aggtggagaa ggaactcacc 

1021 ccagaggtgc tgcaggagtg gctggatgaa ttggaggaga tgatgctggt ggtccacatg 

1081 ccccgcctcc gcattgagga cggcttcagt ttgaaggagc agctgcaaga catgggcctt 

1141 gtcgatctgt tcagccctga aaagtccaaa ctcccaggta ttgttgcaga aggccgagat 

1201 gacctctatg tctcagatgc attccataag gcatttcttg aggcaaatga agaaggcagt 

1261 gaagcagctg caagtaccgc tgttgtgatt gctggccgtt cgctaaaccc caacagggtg 

1321 actttcaagg ccaacatgcc tttcctggtt tttataagag aagttcctcc gaacactatt 

1381 atcttcatgg gcagggtagc caacccttgt gttaagtaaa atgttctcta gaggatcccc 

1441 catcgatggg gtaccgagct cgaattc 



FIG . 4B 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 
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receptor, 



PRI 03-APR-1996 
complete cos. 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



clone HPL. 



Yokota,Y., Kageyama,R. , 
human thromboxane A2 receptor 



Eki,T., Ozawa,K. and 



the human thromboxane 



HUMHTAR 2932 bp mRNA 

Human mRNA for thromboxane A2 
D38081 
g533325 

thromboxane A2 receptor. 
Homo spaiens placenta cDNA to mRNA, 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; 
Homo. 

1 (bases 1 to 2932) 
Hirata f M., Hayashi,Y., Ushikubi,F., 
Nakanishi,S. and Narumiya,S. 
Cloning and expression of cDNA for 
Nature 349 (6310), 617-620 (1991) 
91156030 

2 (sites) 

Nusing,R.M., Hirata.M., Kakizuka.A. 
Narumiya , S . 

Characterization and chromosomal mapping o 
A2 receptor gene 

J. Biol. Chem. 268 (33), 25253-25259 (1993) 
94043399 

3 (bases 1 to 2932) 
Hirata,M. 
Direct Submission 

Submitted (26-AUG-1994) to the DDB J / EMBL /GenBank databases. 
Masakazu Hirata, Kyoto University Faculty of Medicine, Department 
of Pharmacology; Yoshida, Sakyo-ku, Kyoto, Kyoto 606, Japan 
(Tel: 81-75-753-4392, Fax: 81-75-753-4693 ) 
Location/Qualifiers 
1..2932 

/ organism= ■ Homo sapi ens " 
/db_xref = B taxon: 9606 " 
/ 1 is sue_type= " pi acent a ■ 
1. .705 

/note=°This part of the cDNA clone may not belong to the 
thromboxane A2 receptor gene. Please refer to Nuesing, 
R.M. et al. (reference2) B 
992. .2023 
/codon_start=l 
/evidence=experimental 

/product= "Human thromboxane A2 receptor" 

/db_xref="PID:dl007852" 

/db_xref="PID:g533326» 

/ trans la t ion= " MWPNGSSLGPCFRPTNITLEERRLI AS PWFAASFCWGLASNLL 

ALSVTAGARQGGSHTRSSFLTFLCGLVXTDFI^LLVTGTIWSQHAALFEWHAVDPGC 

RLCRFMGWMI FFGLSPLLLGAAMASERYLGITRPFS RPAVASQRRAWATVGLVWAAA 

LALGLLPLIX^GRYTVQYPGSWCFLTLGAESGBVA^ 

VATI£HVYHGQEAAQQRPRDSEVEMMAQLU3IMWASVC^ 

SPAGQLSRTTEKELLIYLRVATWNQILDPWVYILFRRAVLRRLQPRLSTRPRSLSLQP 
QLTQRSGLQ" 
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repeat_unit 
repeat_unit 
polyA_signal 
polyA_site 



2221. .2338 
2515. .2636 
2908. .2913 
2932 



BASE COUNT 
ORIGIN 



521 a 



/evidence=experimental 
940 c 777 g 



694 t 



1 gtaatgcaga gataataaaa cttcttaggt ccataggtct tataataatt taataaccta 
61 aacatggtat acaaattcct ccaaacccaa taacataatt atagtttcaa aaagttcccc 
121 aaactttcaa gttagatttt attgctttga tgagtggctt taaatatgaa aagtcttgcc 
181 tgtgaagggc aatccttttc ccgtggactg ggatctatag aaatacagaa atgtgcccag 
241 gggttcatct ccctaataac catcattcac atttctcaac ctccctaata accagccacc 
301 atgtgagaag gatccacagt tactgtttat gactataatt aactagtacc tgggactggt 
361 cagtggagtt ggttgcaacc tgatgctaag gatgtcaaag ttgtctcggc ctctgttccc 
421 agccagtaag taattccctg gcctcgggcc atacccccta atcttggtca gctgattatg 
481 acaggcagac agcacagtaa ataacactat atattaagaa aacccaaagc atatgtatca 
541 atggtatata cccaacagca tcctaggaat ggagagtctg tagcaagggc ctccaatgtg 
601 aaggtcaaca cagtcactgt gatgcgtgta tttccatttt gtaaagcatg atctctggtg 
661 gtcattttta tcttcctaac ttattggaaa agtctcctgt tttgggggcc cgcccctggt 
721 cacagccaga ctgactcagt ttccctggga ggtcccgctc gagcccgtcc tccccctccc 
781 tctgcccgcc cccagccctc gccccaccct cggcgcccgc acatctgcct gctcagctcc 
841 agacggcgcc cggacccccg ggcgcgggat ccagccaggt gggagccccg cagatgaggt 
901 ctctgaaggt gtgcctgaac cagtgccagc ctgccctgtc tgcagcatcg gcctgatggg 
961 gtggtgactg atccctcagg gctccggagc catgtggccc aacggcagtt ccctggggcc 
1021 ctgttcccgg cccacaaaca ttaccctgga ggagagacgg ctgatcgcct cgccctggtt 
1081 cgccgcctcc ttctgcgtgg tgggcctggc ctccaacctg ctggccctga gcgtgctggc 
1141 gggcgcgcgg caggggggtt cgcacacgcg ctcctccttc cccaccttcc tctgcggcct 
1201 cgtcctcacc gacttcctgg ggctgctggt gaccggtacc atcgtggtgt cccagcacgc 
1261 cgcgctcttc gagtggcacg ccgtggaccc tggctgccgt ctctgtcgct tcatgggcgt 
1321 cgtcatgatc ttcttcggcc tgtccccgct gctgctgggg gccgccatgg cctcagagcg 
1381 ctacctgggt atcacccggc ccttctcgcg cccggcggtc gcctcgcagc gccgcgcctg 
1441 ggccaccgtg gggctggtgt gggcggccgc gctggcgctg ggcctgctgc ccctgctggg 
1501 cgtgggtcgc tacaccgtgc aatacccggg gtcctggtgc ttcctgacgc tgggcgccga 
1561 gtccggggac gtggccttcg ggctgctctt ctccatgctg ggcggcctct cggtcgggct 
1621 gtccttcctg ctgaacacgg tcagcgtggc caccctgtgc cacgtctacc acgggcagga 
1681 ggcggcccag cagcgtcccc gggactccga ggtggagatg atggctcagc tcctggggat 
1741 catggtggtg gccagcgtgt gttggctgcc ccttctggtc ttcattgccc agacagtgct 
1801 gcgaaacccg cctgccatga gccccgccgg gcagctgtcc cgcaccacgg agaaggagct 
1861 gctcatctac ttgcgcgtgg ccacctggaa ccagatcctg gacccctggg tgtatatcct 
1921 gttccgccgc gccgtgctcc ggcgtctcca gcctcgcctc agcacccggc ccaggtcgct 
1981 gtccctccag ccccagctca cgcagcgctc cgggctgcag taggaagtgg acagagcgcc 
2041 cctcccgcgc ctttccgcgg agcccttggc ccctcggaca gcccatctgc ctgttctgag 
2101 gattcagggg ctgggggtgc tggatggaca gtgggcatca gcagcagggt tttgggttga 
2161 ccccaatcca acccggggac ccccaactcc tccctgatcc ttttaccaag cactctccct 
2221 tcctcggccc ctttttccca tccagagctc ccaccccttc tctgcgtccc tcccaacccc 
2281 aggaagggca tgcagacatt ggaagagggt cttgcattgc tatttttttt tttagacgga 
2341 gtcttgctct gtcccccagg ctggagtgca gtggcgcaat ctcagctcac tgcaacctcc 
2401 acctcccggg ttcaagcgat tctcctgcct cagcctcctg agtagctggg actataggcg 
2461 cgcgccacca cgcccggcta atttttgtat ttttagtaga gacggggttt caccgtgttg 
2521 gccaggctgg tcttgaactc ctgacctcag gtgattcacc agcctcagcc tcccaaagtg 
2581 ctgggatcac aggcatgaac caccacacct ggccattttt tttttttttt tagacggagt 
2641 ctcactctgt ggcccagcct ggagtacagt ggcacgatct cggctcactg caacctccgc 
2701 ctcccgggtt caagcgattc tcgtgcctca gcctcccgag cagctgggat tacaggcgta 
2761 agccactgcg cccggccttg catgctcttt gaccctgaat tcgacctact tgctggggta 
2821 cagttgcttc cttttgaacc tccaacaggg aaggctctgt ccagaaagga ctgaatgtga 
2881 aacgggggca cccccttttc ttgccaaaat atatctctgc ctttggtttt at 
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LOCUS HUMGP3A 3170 bp mRNA PRI 08-NOV-1994 

DEFINITION Human endothelial membrane glycoprotein Ilia (GPIIIa) mRNA, 

complete cds. 
ACCESSION J02703 
NID gl83452 

KEYWORDS glycoprotein; glycoprotein Ilia. 

SOURCE Human umbilical vein endothelial cell, cDNA to mRNA. 

ORGANISM Homo sapiens 

EuJcaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE I (bases 1 to 3170) 

AUTHORS Fitzgerald, L. A. , Steiner,B., Rall,S.C. Jr., Lo, S.S. and 
Phillips, D. R. 

Protein sequence of endothelial glycoprotein Ilia derived from a 
cDNA clone. Identity with platelet glycoprotein Ilia and 



TITLE 
similarity 



JOURNAL 
MEDLINE 
COMMENT 



to 

FEATURES 

source 



sig_peptide 



CDS 



to ' integrin 1 

J. Biol. Chem. 262 (9), 3936-3939 (1987) 
87165991 

Draft entry and computer-readable sequence for [1] kindly provided 
by L.A.Fitzgerald, 10-FEB-1987. 

The endothelial membrane glycoprotein Ilia is probably identical 

the platelet glycoprotein Ilia. 
Location/Qualifiers 
1. .3170 

/organism="Homo sapiens" 
/db_xref = • taxon : 9606 " 
/map= B 17q21.32- 
21. .98 

/gene="ITGB3* 

/note=" glycoprotein Ilia signal peptide (putative); 
putative" 
21. .2387 
/gene="ITGB3° 

/note= 'glycoprotein uia precursor" 
/codon_start=l 
/db_xref=°GDB:G00-120-013" 
/db_xref="PID:g306786* 

/ trans la t i on= • MRARPRPRPLWVTVLALGALAGVGVGGPNICTTRGVSSCQQCLA 

VS PMCAWCSDEALPLG SPRCDLKENLLKDNCAPESI EFPVSEARVXiEDRPLSDKGSGD 

SSQVTQVSPQRIALRLRPDDSKNFSIQVllQV^DYPVDIYYLMDLSySMKDDLWSIQNL 

GTKLATQMRKLTSNLRIGFGAFTOKPVSPYMYISPPEALENPCYDMKTTCLPMFGY^ 

VliTLTDQVTRFNEEVKKQSVSRNRDAPEGGFDAIMQATVCDEKIGWROT 

DAKTHIALDGRLAGIVQPNDGQCHVGSDNHYSASTTMDYPSLGLOT 

AVTENVVNLYQNYSELI PGTTVGVljSMDSSNvTiQLIvIJAYGKIRSKv^LEVRDLPEEL 

SLSFNATCI^NEVIPGUCSmGUCIGDTVSFSIEAKTOGCPQEKEKSFTIKPVG 

LIVQVTFDCDCACQAQAEPNSHRCNNGNGTFECGVCRCGPGWLGSQCECSEEDYRPSQ 

QDECSPREGQPVCSQRGECLCGQCVCHSSDFGKITGKYCECDDFSCVRYKGEMCSGHG 

(^SCGDCU:DSDWTGYYCNCTTRTDTa^SSNGLLCSGRGKCECGSCVCIQPGSYGDTC 
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EKCPTCPDACTFKKECVECKKFDREPYOTEOT 

YKNEDIX^AmFQYYEDSSGKSILYWEEPECPKGP 

WKLLITIHDRKEFAKFEEERARAKWDTANNPLYKEATSTFTNITYRGT" 
gene 21.. 23 87 

/gene=»ITGB3" 
mat_peptide 99 . . 2384 

/gene= ■ ITGB3 • 
/note= "glycoprotein Ilia" 
BASE COUNT 705 a 809 c 909 g 747 t 

ORIGIN 132 bp upstream of SacI site. 

1 cgccgcggga ggcggacgag atgcgagcgc ggccgcggcc ccggccgctc tgggtgactg 
61 tgctggcgct gggggcgctg gcgggcgttg gcgtaggagg gcccaacatc tgtaccacgc 
121 gaggtgtgag ctcctgccag cagtgcctgg ctgtgagccc catgtgtgcc tggtgctctg 
181 atgaggccct gcctctgggc tcacctcgct gtgacctgaa ggagaatctg ctgaaggata 
241 actgtgcccc agaatccatc gagttcccag tgagtgaggc ccgagtacta gaggacaggc 
301 ccctcagcga caagggctct ggagacagct cccaggtcac tcaagtcagt ccccagagga 
361 ttgcactccg gctccggcca gatgattcga agaatttctc catccaagtg cggcaggtgg 
421 aggattaccc tgtggacatc tactacttga tggacctgtc ttactccatg aaggatgatc 
481 tgtggagcat ccagaacctg ggtaccaagc tggccaccca gatgcgaaag ctcaccagta 
541 acctgcggat tggcttcggg gcatttgtgg acaagcctgt gtcaccatac atgtatatct 
601 ccccaccaga ggccctcgaa aacccctgct atgatatgaa gaccacctgc ttgcccatgt 
661 ttggctacaa acacgtgctg acgctaactg accaggtgac ccgcttcaat gaggaagtga 
721 agaagcagag tgtgtcacgg aaccgagatg ccccagaggg tggctttgat gccatcatgc 
781 aggctacagt ctgtgatgaa aagattggct ggaggaatga tgcatcccac ttgctggtgt 
841 ttaccactga tgccaagact catatagcat tggacggaag gctggcaggc attgtccagc 
901 ctaatgacgg gcagtgtcat gttggtagtg acaatcatta ctctgcctcc actaccatgg 
961 attatccctc tttggggctg atgactgaga agctatccca gaaaaacatc aatttgatct 
nno 1 tt ^ ca 9 t gac tgaaaatgta gtcaatctct atcagaacta tagtgagctc atcccaggga 
1081 ccacagttgg ggttctgtcc atggattcca gcaatgtcct ccagctcatt gttgatgctt 
at S9gaaaat ccgttctaaa gtcgagctgg aagtgcgtga cctccctgaa gagttgtctc 
1201 tatccttcaa tgccacctgc ctcaacaatg aggtcatccc tggcctcaag tcttgtatgg 
1261 gactcaagat tggagacacg gtgagcttca gcattgaggc caaggtgcga ggctgtcccc 
1321 aggagaagga gaagtccttt accataaagc ccgtgggctt caaggacagc ctgatcgtcc 
1381 aggtcacctt tgattgtgac tgtgcctgcc aggcccaagc tgaacctaat agccatcgct 
1441 gcaacaatgg caatgggacc tttgagtgtg gggtatgccg ttgtgggcct ggctggctgg 
1501 gatcccagtg tgagtgctca gaggaggact atcgcccttc ccagcaggac gagtgcagcc 
1561 cccgagaggg tcagcccgtc tgcagccagc ggggcgagtg cctctgtggt caatgtgtct 
1621 gccacagcag tgactttggc aagatcacgg gcaagtactg cgagtgtgac gacttctcct 
1681 gtgtccgcta caagggggag atgtgctcag gccatggcca gtgcagctgt ggggactgcc 
1741 tgtgtgactc cgactggacc ggctactact gcaactgtac cacgcgtact gacacctgca 
1801 tgtccagcaa tgggctgctg tgcagcggcc gcggcaagtg tgaatgtggc agctgtgtct 
1861 gtatccagcc gggctcctat ggggacacct gtgagaagtg ccccacctgc ccagatgcct 
1921 gcacctttaa gaaagaatgt gtggagtgta agaagtttga ccgggagccc tacatgaccg 
1981 aaaatacctg caaccgttac tgccgtgacg agattgagtc agtgaaagag cttaaggaca 
2041 ctggcaagga tgcagtgaat tgtacctata agaatgagga tgactgtgtc gtcagattcc 
2101 agtactatga agattctagt ggaaagtcca tcctgtatgt ggtagaagag ccagagtgtc 
2161 ccaagggccc tgacatcctg gtggtcctgc tctcagtgat gggggccatt ctgctcattg 
2221 gccttgccgc cctgctcatc tggaaactcc tcatcaccat ccacgaccga aaagaattcg 
2281 ctaaatttga ggaagaacgc gccagagcaa aatgggacac agccaacaac ccactgtata 
2341 aagaggccac gtctaccttc accaatatca cgtaccgggg cacttaatga taagcagtca 
2401 tcctcagatc attatcagcc tgtgccagga ttgcaggagt ccctgccatc atgtttacag 
2461 aggacagtat ttgtggggag ggatttcggg gctcagagtg gggtaggttg ggagaatgtc 
2521 agtatgtgga agtgtgggtc tgtgtgtgtg tatgtggggg tctgtgtgtt tatgtgtgtg 
2581 tgttgtgtgt gggagtgtgt aatttaaaat tgtgatgtgt cctgataagc tgagctcctt 
2641 agcctttgtc ccagaatgcc tcctgcaggg attcttcctg cttagcttga gggtgactat 
2701 ggagctgagc aggtgttctt cattacctca gtgagaagcc agctttcctc atcaggccat 
2761 tgtccctgaa gagaagggca gggctgaggc ctctcattcc agaggaaggg acaccaagcc 
2821 ttggctctac cctgagttca taaatttatg gttctcaggc ctgactctca gcagctatgg 
2881 taggaactgc tggcttggca gcccgggtca tctgtacctc tgcctccttt cccctccctc 
2941 aggccgaagg aggagtcagg gagagctgaa ctattagagc tgcctgtgcc ttttgccatc 
3001 ccctcaaccc agctatggtt ctctcgcaag ggaagtcctt gcaagctaat tctttgacct 
3061 gttgggagtg aggatgtctg ggccactcag gggtcattca tggcctgggg gatgtaccag 
3121 catctcccag ttcataatca caacccttca gatttgcctt attggcagcg 
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LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
platelet 

SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
MEDLINE 
COMMENT 

FEATURES 

source 



mRNA 



gene 



HUMPLG2B 3303 bp mRNA PRI 07-JAN-1995 

Human platelet membrane glycoprotein lib ( ITGA2B) mRNA, complete 
cds. 
J02764 
gl90067 

membrane adhesive protein; platelet membrane glycoprotein; 
receptor. 

Human HEL ceil, cDNA to mRNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebra ta; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 3303) 

Poncz,M., Eisman,R., Heidenreich, R. , Silver, S.M., Vilaire,G., 
Surrey, S., Schwartz, E. and Bennett, J. S. 

Structure of the platelet membrane glycoprotein lib. Homology to 
the alpha subunits of the vitronectin and fibronectin membrane 
receptors 

J. Biol. Chem. 262 (18), 8476-8482 (1987) 
87250457 

Draft entry and computer-readable sequence (1] kindiv provided by 
M.Poncz, 15-APR-1987. 

Location/Qualifiers 
1. .3303 

/organism= "Homo sapiens" 
/ db_xr e f = " t axon : 9 6 0 6 ■ 
/map="17q21.32" 
<1. .3303 
/gene="ITGA2B" 
/note="G00-120-012" 
1. .3303 

/gene="ITGA2B' 
2. .94 

/gene= " ITGA2B" 
/note= n G00-120-012" 
2. .3121 
/gene=°ITGA2B" 
/codon_start=l 
/db_xref="GDB:G00-120-012 w 

/product= "platelet membrane glycoprotein lib" 
/db_xref="PID:gl90068" 

/ translation="MARAI^PLQALV^LEVA^LLIX5PCAAPPAWALNIJ5PVQLTFYAG 

PNGSQFGFSLDFHKDSHGRVAIWGAPRTLGPSQEETGGVFLCPWRAEGGQCPSLLFD 

LRDETRNVGSQTLQTFKARQGU3ASWSWSDVIVACAPWQHWNVXEKTEEAEKT 

CFIJVQPESGRRAEYSPCRGNTLSRIYVENDFSWDKRYCEAGFSSVVTQAGELVLGAPG 

GYYFLGLLAQAPVADIFSSYRPGILLWHVSSQSLSFDSSNPEYFDGYWGYSVAVGEFD 

GDLNTTEYWGAPTWSWTTX^VEILDSYYQRiaRLRAEQMASYFGHSVAVT 

HDLLVGAPLYMESRADRKLAEVGRVYLFI^PRGPHALGAPSLLLTGTQLYGRFGSAIA 

PIX3DLDRDGYNDIAVAAPYGGPSGRGQVXWI^QSEGIJISRPSQVU3SPFPTGSAFGF 

SLRGATOIDDNGYPDLrVGAYGANQVAVTRAQPWKASVQLIjVQDSLNPAvl^ 

TKTPVSCFNIQMCVGATGHNI PQKLSLNAELQUDRQKPRQGRRVLIJXSSQQAGTTL^L 

DLGGKHSPICHTTMAFLRDEADFRDKLSPrVLSLNVSLPPTEAGM^ 
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EQTRI VLDSGEDDVCVPQLQLTASVTG S PLLVGADNVLELQMDAANEGEGAYEAELAV 
HLPQGAHYMRALSNVEGFERLICNQKKENETRW^ 

LEEAGESVSFQLQIRSKNSQNPNSKIVLLDVPVRAEAQVELRGNSFPASLWAAEEGE 
REQNSlJ^SWGPKVEHTYEIJiNNGPGTVNGLHLSIHLPGQSQPSDLLYIL^ 
CFPQPPVNPLKVDWGLPIPSPSPIHPAHHKRDRRQIFLPEPEQPSRLQDPVLVSCDSA 
PCTWQCDI^EMARGQRAMVTVIAFLWLPSLYQRPLDQ 

LSLPRGEAQVWTQLLRALEERAIPIWWVLVGVLGGI^LLTILVLAMWKVGFFKRNR 
LEEDDEEGE" 
mat_peptide 95 . .3118 

/gene="iTGA2B B 
/note=-G00-120-012" 
M ^ /product= -platelet membrane glycoprotein lib" 

BASE COUNT 618 a 997 c 1026 g 662 t 

ORIGIN Unreported. 

1 gatggccaga gctttgtgtc cactgcaagc cctctggctt ctggagtggg tgctgctgct 
61 cttgggacct tgtgctgccc ctccagcctg ggccttgaac ctggacccag tgcagctcac 
121 cttctatgca ggccccaatg gcagccagtt tggattttca ctggacttcc acaaggacag 
181 ccatgggaga gtggccatcg tggtgggcgc cccgcggacc ctgggcccca gccaggagga 
241 gacgggcggc gtgttcctgt gcccctggag ggccgagggc ggccagtgcc cctcgctgct 
301 ctttgacctc cgtgatgaga cccgaaatgt aggctcccaa actttacaaa ccttcaaggc 
361 ccgccaagga ctgggggcgt cggtcgtcag ctggagcgac gtcattgtgg cctgcgcccc 
421 ctggcagcac tggaacgtcc tagaaaagac tgaggaggct gagaagacgc ccgtaggtag 
481 ctgctttttg gctcagccag agagcggccg ccgcgccgag tactccccct gtcgcgggaa 
541 caccctgagc cgcatttacg tggaaaatga ttttagctgg gacaagcgtt actgtgaagc 
601 gggcttcagc tccgtggtca ctcaggccgg agagctggtg cttggggctc ctggcggcta 
661 ttatttctta ggtctcctgg cccaggctcc agttgcggat attttctcga gttaccgccc 
721 aggcatcctt ttgtggcacg tgtcctccca gagcctctcc tttgactcca gcaacccaga 
781 gtacttcgac ggctactggg ggtactcggt ggccgtgggc gagttcgacg gggatctcaa 
841 cactacagaa tatgtcgtcg gtgcccccac ttggagctgg accctgggag cggtggaaat 
901 tttggattcc tactaccaga ggctgcatcg gctgcgcgca gagcagatgg cgtcgtattt 
tsggcattca gtggctgtca ctgacgtcaa cggggatggg aggcatgatc tgctggtggg 
1021 cgctccactg tatatggaga gccgggcaga ccgaaaactg gccgaagtgg ggcgtgtgta 
1081 tttgttcctg cagccgcgag gcccccacgc gctgggtgcc cccagcctcc tgctgactgg 
1141 cacacagctc tatgggcgat tcggctctgc catcgcaccc ctgggcgacc tcgaccggga 
1201 tggctacaat gacattgcag tggctgcccc ctacgggggt cccagtggcc ggggccaagt 
} 2 !} gctggtgttc ctgggtcaga gtgaggggct gaggtcacgt ccctcccagg tcctggacag 
1321 ccccttcccc acaggctctg cctttggctt ctcccttcga ggtgccgtag acatcgatga 
1381 caacggatac ccagacctga tcgtgggagc ttacggggcc aaccaggtgg ctgtgtacag 
1441 agctcagcca gtggtgaagg cctctgtcca gctactggtg caagattcac tgaatcctgc 
1501 tgtgaagagc tgtgtcctac ctcagaccaa gacacccgtg agctgcttca acatccagat 
1561 gtgtgttgga gccactgggc acaacattcc tcagaagcta tccctaaatg ccgagctgca 
1621 gctggaccgg cagaagcccc gccagggccg gcgggtgctg ctgctgggct ctcaacaggc 
1681 aggcaccacc ctgaacctgg atctgggcgg aaagcacagc cccatctgcc acaccaccat 
1741 ggccttcctt cgagatgagg cagacttccg ggacaagctg agccccattg tgctcagcct 
1801 caatgtgtcc ctaccgccca cggaggctgg aatggcccct gctgtcgtgc tgcatggaga 
noon cacccat 9tg caggagcaga cacgaatcgt cctggactct ggggaagatg acgtatgtgt 
iqoi gcccca 9 ctt cagctcactg ccagcgtgac gggctccccg ctcctagttg gggcagataa 
tgtcctggag ctgcagatgg acgcagccaa cgagggcgag ggggcctatg aagcagagct 
2041 ggccgtgcac ctgccccagg gcgcccacta catgcgggcc ctaagcaatg tcgagggctt 
2101 tgagagactc atctgtaatc agaagaagga gaatgagacc agggtggtgc tgtgtgagct 
2161 gggcaacccc atgaagaaga acgcccagat aggaatcgcg atgttggtga gcgtggggaa 
tctggaagag gctggggagt ctgtgtcctt ccagctgcag atacggagca agaacagcca 
2281 gaatccaaac agcaagattg tgctgctgga cgtgccggtc cgggcagagg cccaagtgga 
2341 gctgcgaggg aactcctttc cagcctccct ggtggtggca gcagaagaag gtgagaggga 
2401 gcagaacagc ttggacagct ggggacccaa agtggagcac acctatgagc tccacaacaa 
2461 tggccctggg actgtgaatg gtcttcacct cagcatccac cttccgggac agtcccagcc 
2521 ctccgacctg ctctacatcc tggatataca gccccagggg ggccttcagt gcttcccaca 
2581 gcctcctgtc aaccctctca aggtggactg ggggctgccc atccccagcc cctcccccat 
2641 tcacccggcc catcacaagc gggatcgcag acagatcttc ctgccagagc ccgagcagcc 
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2701 ctcgaggctt caggatccag ttctcgtaag 
2761 gtgtgacctg caggagatgg cgcgcgggca 
2821 gtggctgccc agcctctacc agaggcctct 
2881 gttcaacgtg tcctccctcc cctatgcggt 
2941 tcaggtgtgg acacagctgc tccgggcctt 
3001 gctggtgggt gtgctgggtg gcctgctgct 
3061 ggtcggcttc ttcaagcgga accggccacc 
3121 atggtgcagc ctacactatt ctagcaggag 
3181 tccaacaagt tgcctccaag ctttgggttg 
3241 tttccctccc aacagagctg ggctaccccc 
3301 ctg 



ctgcgactcg gcgccctgta ctgtggtgca 
gcgggccatg gccacggcgc tggccttcct 
ggatcagttt gtgctgcagt cgcacgcatg 
gcccccgctc agcctgcccc gaggggaagc 
ggaggagagg gccattccaa tctggtgggt 
gctcaccatc ctggtcctgg ccatgtggaa 
cctggaagaa gacgatgaag agggggagtg 
ggttgggcgt gctacctgca ccgccccttc 
gagctgttcc attgggtcct cttggtgtcg 
cctcctgctg cctaataaag agactgagcc 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 

SOURCE 
ORGANISM 



14-JAN-1995 



REFERENCE 
AUTHORS 
TITLE 

protease 

JOURNAL 
MEDLINE 
COMMENT 

FEATURES 

source 



pnm_transcript 
CDS 



HUMTFPB 13865 bp DNA PRI 

Human tissue factor gene, complete cds. 
J02846 
g339505 

Alu repeat; cell surface integral membrane protein; cell surface 
receptor; tissue factor. 

Human DNA, clones lambda-TF [559, 679 , 753 , 885, 1377] . 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini ; Hominidae; Homo. 
I (bases 1 to 13865) 

Mackman,N., Morrissey, J.H. , Fowler, B. and Edgington, T.S. 
Complete sequence of the human tissue factor gene, a highly 
regulated cellular receptor that initiates the coagulation 

cascade 

Biochemistry 28 (4), 1755-1762 (1989) 
89247359 

Draft entry and computer-readable sequence for [1] kindly provided 
by J.H. Morrissey, 25-OCT-1988. 

Location/Qualifiers 
1. .13865 

/organism="Homo sapiens" 
/ db_xre f = ■ t axon : 9 6 0 6 " 
/map="lp22-p21" 
799. .13232 

/note= n TF mRNA and introns" 

join (922. .1021,2190. . 2301 , 6392 . . 6591 , 9289 . . 9467 , 
10075. .10234,11955. .12091) 
/gene=*F3" 

/note=" tissue factor" 
/codon_start=l 
/db_xref="GDB:G00-119-895" 
/db_xref="PID:g339506° 

/ 1 r ans 1 a t i on = ■ MET PAWPRVPRPETAVARTIXIX3WVTAQVAGASGTTNTVAAYNL 

TWKSTNFKTILEWEPKFVNQVYTOQIS^ 

yi^VFSYPAGWESTGSAGEPLYENSPEFTPYLETNI^QPTIQSFEQVGTKvWTVE 

DERTLVRRNNTFLSLRDVFGKDLI YTLYYWKS S S SGKKTAKTNTNEFLI DVDKGENYC 

FSVQAVI PSRTVNRKSTDS PVECMGQEKGEFREI FY 1 1 GAWFWI ILVI I LAI SLHK 
CRKAGVGQSWKENSPLNVS " 
exon <922..1021 
/gene="F3 ■ 

/note=" tissue factor" 
/ number =1 

gene join(922. .1021,2190. .2301,6392 . . 6591,9289. .9467, 

10075. .10234,11955. .12091) 

/gene='F3" 
intron 1022.. 2189 

/note="TF intron A" 
exon 2190.. 2301 

/gene="F3 " 

/number= 2 
intron 2302.. 6391 

/note="TF intron B" 
repeat_region 6127.. 6241 

/note="Alu repeat partial copy A" 
exon 6392.. 6591 

/gene="F3 ■ 

/number =3 
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intron 6592. .9288 

/note="TF intron C m 
repeater egion 8391.. 8677 

/note='Alu repeat copy B" 
exon 9289.. 9467 

/gene="F3* 

/number=4 
intron 9468. .10074 

/note="TF intron D" 
exon 10075.. 10234 

/gene="F3" 

/number =5 
intron 10235. .11954 

/note=°TF intron E" 
repeat_region 10954 . . 11249 

/note=°Alu repeat copy C" 
exon 11955. .>12091 

/gene=°F3- 

/note=" tissue factor" 
/number =6 
repeat_region 12458 . . 12757 

/note= M Alu repeat copy D" 
BASE COUNT 3711 a 2955 c 3240 g 3959 t 

ORIGIN 1 bp upstream of EcoRI site; chromosome 1. 

1 gaattctccc agaggcaaac tgccagatgt gaggctgctc ttcctcagtc actatctctg 
61 gtcgtaccgg gcgatgcctg agccaactga ccctcagacc tgtgagccga gccggtcaca 
121 ccgtggctga caccggcatt cccaccgcct ttctcctgtg cgacccgcta agggccccgc 
181 gaggtgggca ggccaagtat tcttgacctt cgtggggtag aagaagccac cgtggctggg 
241 agagggccct gctcacagcc acacgtttac ttcgctgcag gtcccgagct tctgccccag 
301 gtgggcaaag catccgggaa atgccctccg ctgcccgagg ggagcccaga gcccgtgctt 
3 61 tctattaaat gttgtaaatg ccgcctctcc cactttatca ccaaatggaa gggaagaatt 
421 cttccaaggc gccctccctt tcctgccata gacctgcaac ccacctaagc tgcacgtcgg 
481 agtcgcgggc ctgggtgaat ccgggggcct tgggggaccc gggcaactag acccgcctgc 
541 gtcctccagg gcagctccgc gctcggtggc gcggttgaat cactggggtg agtcatccct 
601 tgcagggtcc cggagtttcc taccgggagg aggcggggca ggggtgtgga ctcgccgggg 
661 gccgcccacc gcgacggcaa gtgacccggg ccgggggcgg ggagtcggga ggagcggcgg 
721 gggcgggcgc cgggggcggg cagaggcgcg ggagagcgcg ccgccggccc tttatagcgc 
781 gcggggcacc ggctccccaa gactgcgagc tccccgcacc ccctcgcact ccctctggcc 
841 ggcccagggc gccttcagcc caacctcccc agccccacgg gcgccacgga acccgctcga 
901 tctcgccgcc aactggtaga catggagacc cctgcctggc cccgggtccc gcgccccgag 
inon acc 3 cc 9 tc 9 ctcggacgct cctgctcggc tgggtcttcg cccaggtggc cggcgcttca 
1021 ggtgagtggc accagcccct ggaagcccgg ggcgcgccac acgcaggagg gaggcgacag 
tcct 95ctgg cagcgggctc gccctggttc cccggggcgc ccatgttgtc ccccgcgcct 
1141 acgggactcg gctgcgctca cccagcccgg cttgaatgaa ccgagtccgt cgggcgccgg 
1201 cgggagttgc agggagggag ttggcgcccc agaccccgct gccccttccg ctggagagtt 
ttgctcgggg tgtccgagta attggactgt tgttgcataa gcggactttt agctcccgct 
1321 ttaactctgg ggaaagggct tcccagtgag ttgcgacctt caatatgata ggacttgtgc 
ct 9 c 9tctgc acgtgttggc gtgcagaggt ttggatatta tctttcatta tatgtgcatc 
1441 ttcccttaat aaagagcgtc cctggtcttt tcctggccat ctttgttcta ggtttgggta 
1501 gaggcaatcc aaaagggctg gattgctgct tagattggag caggtacaac gttgtgcatg 
1561 ccccgtattt ctacgaggtg ttcgggacgg cgtagagact gggacctgct gcgtactggc 
1621 aaagcagacc ttcataagaa ataatcctga tccaatacag ccgacggtgt gacaggccac 
1681 acgtccccgt gggtctctgt ggaagtttca gtgtagcgac atttcagata aaagtggaaa 
1741 aagtgaagtt tggctttttt catttgtatg cagtcctaac tcttgtcaca cgtgtgggat 
1801 ttatcttttt ccataactta ctgaaaaccc ttcctggcgg gctgaacctg actcttcctg 
nooi a ^ ct 9 a 9tcc tggactggca cactgatggc tctgggctct tcccggtcaa gttataacaa 
1921 ggctttgccc atgaataatt tcaaacgaaa atgtcaagat ccttgccggt gtcctgggat 
on??" tacaa 99 t 9 a atcttgtcat gaagaaattc taggtctaga aaaaatttga agattctttt 
2041 tctcttgata attcactaat gaagcttttg tggttgaaaa ataaaaagtg aggtttatgg 
•JUT tgatgtcagg tgggaaggtg ttttatacat caatacattc gagtgctctg aagtgcatgt 
2161 aataatagct gtttctctgt tgtttaaagg cactacaaat actgtggcag catataattt 
2221 aacttggaaa tcaactaatt tcaagacaat tttggagtgg gaacccaaac ccgtcaatca 
2281 agtctacact gttcaaataa ggtaagctgg gtacagaaaa agaaaattaa ggtctttgat 
"JJ STtttctactg tcctatgctg aacaagaatg tctttaaagc tgattactgg atgaaattat 
2401 ttaacagatg acgaagaaga agggattctt ggcaattcgc tggccggtgt catactctat 
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24 61 taggcctgca acatttccag accttaaact 
2521 ttggaaatga tgggagagtt cctaagtgga 
2581 agtaggcact gaagtgtgct ttgggtcatg 
2641 tgtctttttc cgttgctgtc tagactgtga 
2701 ggaggaatcc caatgtatac attgccctta 
2761 ccatgaatcg aaatctggta gaatacatga 
2821 actgagcctg gcagagcaga aatactctgc 
2881 tgcttcttgg tgcttcaact ctgactggca 
2941 ttcaatctaa aggttatgac ttccttgatg 
3001 tttttgaaat gttctaggag gcttggtaga 
3061 gaaattattt taatgctaat tacataaaag 
3121 tattttgctg ttctgttttg ttttagcttg 
3181 acttctagat aacgatgcat cttttaagtg 
3241 gacagtagtt gccaaaccag caaggagaac 
3301 atgtatgggg gtggggggag agaaagatga 
3361 agttctggtc aaacttgtca attcagattt 
3421 tgatacaggc ctgaagttta ccttagtaaa 
3481 tttgggagga atgcttacct cctaaatata 
3541 tatatttatg attcatctgc tttttaaaca 
3601 tttataaggc tgctgttatt taaatgagca 
3661 gggctacagc ttgggggatg ccagccgact 
3721 tgctgctgta ctggagggcc tgggagcttt 
3781 ttctcctgcc caccccagga ataaatgaga 
3841 tttacagttg aggaaactgt tgctctgaga 
3901 ggtgagtgcc catgtcaggt ctggaaccaa 
3961 ctcaggtggc tctgccacag tctgatggga 
4021 ttgcccactg catctcctca gttggccttc 
4081 gcatcttaag cagctgcctc tcttccctcc 
4141 agccgcagga cactactgct gtgcagaagc 
4201 cctttgctaa cagttttcag tggtggttgg 
4261 accgtcaccg gtgatattca ttccatggaa 
4321 agcttctgga aaacaacctg caaccaaatt 
4381 tccaaatcag agggttttgc aatgcctgga 
4441 ctattaatgg cattcagagg gattttctac 
4501 gttttactac ttaccagggt actgtataaa 
4561 tggtccctgc tgtgagctgg gaggaaccaa 
4621 ctaggagact ttctcctgtt atctgaacaa 
4681 catagtctca ttcacttttt gaaatggaaa 
4741 ggaacaaaat accctctcta cttttatcac 
4801 ttcagtatca atcttagttt gtgcacttta 
4861 tggcctggtt acttagttca gattttgaaa 
4921 tttagacaat ggaatccatg tggtgcctcg 
4981 tgtaaatgca aaccatctaa tagtcagcga 
5041 acacaagggc atgcagccct cgtaccaggc 
5101 gaaactcatg ctgggggaca ggggagggag 
5161 ttcctggagc aggtggagtt gggacctggc 
5221 gtaatgccaa agggaagagc agcataactg 
5281 caagttgcag tgacgcttca cctatttatt 
5341 agtagaagtc ctttaaatca tttccccttc 
5401 ttagcttttt agtctcagac tttattagac 
5461 tttgttggga tggattcaca tcttgcaaag 
5521 cagacccagc tctgccactc gttagatatg 
5581 agcttcagtg tcctcatgga taagaaagat 
5641 tgaagtaaca tgagtaaagg gtccagcaga 
5701 attaataata ttattaatag tggtcatgag 
5761 ctcactatat agactctatt ctacatagaa 
5821 taagtagact atagtaaaca acctcacttt 
5881 gctctttctc tcctgttacc ctgacagaga 
5941 caaaatggtt gagtacagat ccaagagtca 
6001 tgattttgag ctagtcaccc aatctcactt 
6061 gcaaattaca gagccatccc ctgggttgct 
6121 cggtgttgct aggtatgatg gctcacacct 
6181 gaaggatcag cctgggcaac atagcaggac 
6241 agcaaagtgc tcagcacagt gactgcatca 
6301 gcacagaaca ccacagccag gaagcagtct 
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gatagaacat tttaattgtt ttaattgttt 
gtataaactg tggagagatg aaccatcttg 
atagattaat taatctcatc taaacattga 
acaatgtcta acaccttagg gaagaggtgg 
agcagtgttt gattcattca tctttggact 
tcttagtgga ggaggccaaa tgcgtgactc 
tgtctgcacc ctctgggtct ggtgtggctc 
gctgtcccca ggaggcgata attcagcatg 
gttttcacca tattcttggc aagtttttgg 
gatcttatga aatagagaat agctgctgtg 
tacaaaagta gcactagcta aaacaaaagg 
tgccaggcct tttacagcat taggaatgca 
aatgttcttg tttttcaaaa tgaacttcat 
ttgcatgcat acgtgcatgc atgtgtggat 
aggaatttca taacatgaaa taatgattac 
caccaattga gaattagtaa gtaatttctc 
cactttactt ccatatggta aaaattagat 
ttcaatctaa tatttgagga cacatgggaa 
taagcctttg ttaactgtaa gttcttgaac 
cagctcctga tctgcaaaca gcagagcgca 
cagggtggtc ctgtggactg aacaatctct 
tccatcagcc tcggcctgag gtgtgcactc 
ttcctggtta aaaaggacca gagcagtcat 
agtgagggat ttattcatga ctacactgat 
agtctaccca gtatccacac accaccatcc 
ggctccaaag cgggaggaag aaggaaagtc 
ctctctgcct gttttccctc cctacagtta 
cgactgctct cactactgca gcctggctcc 
ccctacttgg aactccaact gcatttttca 
gaaatgttat tggcttaagc cttagcacaa 
atgttctgaa ttctaaagct gaatttacaa 
agtgactgaa ttttttagtt aactcaaaat 
ggaaccttgg aggcttttaa agtgttaatg 
agaattgtcc cttcattacc tgtttataca 
tccttgtgct aaattttgct atagagtatg 
atactgtatc tctatgttac atagaaagcc 
ctatttgctg tactgataaa aaggaaacag 
tgataaaata aaacacattt tggtcattcg 
ataaaattaa ataaatagaa accaaaatat 
ggataaagaa tgtgtttacc caaatccttt 
gaaaatatat ttgtggcttt tatgtgtgaa 
ttttccctga gattatgtat taattcaacc 
gaccctatag ccctgctgct taatgggggc 
agactgtgtt catattaaca gcatcgtgga 
atgtaaatgc tcagcaggga gatctggaga 
cttgaacgat gggtctggct ctggcagtca 
tcactttcca tgggacagaa gtgtgtgaat 
attttggtca tttagaagaa tttcattgtc 
agtgacgtct cacaaaaaaa agatctgtct 
agatactacc tgtactctta ttctgtaatc 
gaagggaggc atgtagtata atggggcaaa 
tgaccttctg caagttgctt agtgcctgtg 
ccaacacctt cttggaagga ttatatcaaa 
atacctggca tatagtggag tcaatgaatg 
agatatatgt ataacatgtt attatgtaga 
tatagaacat tatataacaa acaactataa 
gtctcagttg cctcatcttg atggaaaact 
gcgtctacat tctaaaagaa agatatttaa 
aatagctgtc tggttcaaag tccagctgtg 
tgtctcagta gccttatttg taaaaacaag 
atgaggactc aaacatgcat cccaagtgct 
gtacattcag cactttggga ggccgaagca 
cccatctcta caaaacaatg tttaaaaaaa 
ttaggattga ttgtagggct cctgatgtta 
atcttgttgg gtgcaaattg taacattcca 
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6361 
6421 
6481 
6541 
6601 
6661 
6721 
6781 
6841 
6901 
6961 
7021 
7081 
7141 
7201 
7261 
7321 
7381 
7441 
7501 
756] 
7621 
7681 
7741 
7801 
7861 
7921 
7981 
8041 
8101 
8161 
8221 
8281 
8341 
8401 
8461 
8521 
8581 
8641 
8701 
8761 
882] 
8881 
8941 
9001 
9061 
9121 
9181 
9241 
9301 
9361 
9421 
9481 
9541 
9601 
9661 
9721 
9781 
9841 
9901 
9961 
10021 
10081 
10141 
10201 



tttatgtttc 
gcttttacac 
agacgtactt 
ctggggagcc 
ttgggctgta 
ctttaggggc 
acagccctct 
ctgatgacag 
cctcagtaag 
ggaaatccac 
tttctgtccc 
tcaatgagga 
tttccaatca 
gcctgtcacc 
ctgtaagctg 
ttattctttt 
attgagcaga 
gtgagtgtta 
ggtggacgga 
gggcctgacc 
gacccccagg 
gcagcccctc 
ctcacgattc 
ttgttggtgc 
ggcctgtgaa 
cattgcagag 
tttctcacag 
tcagtatagt 
taagtataaa 
tgattacaca 
ctgagaggca 
cttaactact 
aacagtcaat 
aagtaagcta 
ggtggctcgt 
tcaggagttc 
aaattagcca 
gagaatcact 
ccggcctggg 
gaaaagaaaa 
tttactattg 
gttgagtagg 
tggaggaagt 
gtggatccac 
atatattatt 
aaacttaaaa 
tgtcttagga 
acatgtttta 
ttgtacagaa 
agccaacaat 
aacggacttt 
acttaattta 
aatttgtttt 
ccaattcaaa 
gcactgtggg 
gactctggca 
tctctaaagg 
ttgtttttgg 
aaggggctca 
ttttagcagt 
ctgtgttctt 
ggggaaagaa 
gccaaaacaa 
agtgttcaag 
gagtgtatgg 



ttccttcttt 
aacagacaca 
ggcacgggtc 
tctgtatgag 
ataccgttca 
tacaaaatta 
tcacacattc 
tgtcatcaag 
cggctgaatg 
caaggccggg 
gttatcacac 
aagtccatgg 
gttggccatg 
ctcgttctgg 
gccctaggag 
gccgcaactg 
gtgaaattag 
ggcccagcga 
caaccaacca 
ccaggtgaat 
tgcttctgct 
tggtgactgt 
tcttttatat 
aagatagaag 
ttgatgtttg 
accccgtggt 
ccctacatat 
agccactagc 
gtacacactg 
ttaaaatgat 
ccgactccct 
aatagcctac 
taacacacat 
gaggaaagaa 
gcctgtaatc 
aagaccagcc 
ggcgtggttg 
tcgacccagg 
tgacagagcg 
gaaaagaaag 
ataaagtgga 
ctgaggagga 
aggaggcggc 
agagttcaaa 
aaaattaatt 
tgacatctga 
ttcagctcca 
tatgagagat 
ttctttggtt 
tcagagtttt 
agtcagaagg 
tacactttat 
tatgacctgt 
aatagcagaa 
gaggggtgga 
gggccccctc 
tcccgccacg 
ttcaatgcat 
atagggttca 
gatcaaggga 
gtaggctttg 
ttgactcaga 
acactaatga 
cagtgattcc 
gccaggagaa 



tctttcttta 
gagtgtgacc 
ttctcctacc 
aactccccag 
ttcttgttag 
aaaatattta 
cagatgtggt 
taactttctc 
tgtgttggga 
gttttagctt 
taaaaatccc 
tttccctctg 
atttgagttc 
ttttggaaag 
ccagtaaaag 
tggctctgag 
cttctcttgt 
gagagaacag 
accatcctcc 
gtggctgcct 
tgtgtctttt 
ggcatggttg 
taatagttct 
atattttatg 
ttttcctgtc 
taaatccggc 
ttttgaacct 
cacatgtggc 
gaatttaaga 
tatattccag 
gtgcagttga 
ctatcggttg 
ttttcatgtt 
aatgttatta 
tcagaacttt 
tggccaacat 
tgggtgcctg 
tggaggaggt 
agactctgtc 
aaggaaggaa 
agtggatcat 
ggaggaggag 
acacttggtg 
cccatgttgt 
tcacctgttc 
ggctccattg 
ggccgccacg 
aattaagttg 
ccaaccaagc 
gaacaggtgg 
aacaacactt 
tattggaaat 
tttaaattgt 
cagagttgtt 
caacaggcct 
ggagacccag 
ctcacatttc 
aatactccct 
atatgcctaa 
aactgattag 
cttagaacct 
gcccagatga 
gtttttgatt 
ctcccgaaca 
aggggaattc 

FIG. 8D 



gcactaagtc 
tcaccgacga 
cggcagggaa 
agttcacacc 
aaacgtctga 
ttcttttttt 
aggaggttca 
ccccagtctg 
gagggcgggc 
ttccctatat 
agttgaggat 
agcccataat 
cgtgatgtgc 
gtggaatact 
aatgaagaga 
ctaggcaatt 
aaggccagct 
tttctcaagg 
tctggtatct 
tcccagagcc 
gtggcaccag 
acattcattt 
tgagtttttt 
tgtttgtttt 
atttaaccaa 
ttctcgaggt 
aaaatatcgt 
tgttgaccac 
agtgtagaat 
atatatgcag 
aaatccgagt 
actgttgact 
gcgtgtatta 
agaaaattat 
gggatgctaa 
ggtgaaaccc 
taatcccagc 
tgcagtgaac 
taaaaaagaa 
gagaaagaat 
cataaaggtg 
gaagagcagg 
taacttttat 
tcaggggtca 
ctttttactt 
tcttcccctt 
cctgcttctt 
tcaattgtga 
tcatttcctt 
gaacaaaagt 
tcctaagcct 
cttcaagttc 
gaatacttgg 
gagaaggtga 
ggtcctacct 
gttcctcagc 
tccctctatt 
tcctttttct 
attggatctt 
cgaagtcact 
aggtttttac 
attaagaact 
gatgtggata 
gttaaccgga 
agaggtgagt 



aggagattgg 
gattgtgaag 
tgtggagagc 
ttacctggag 
acattctcgt 
ctcagaaact 
cagaatgtga 
tccccagacc 
cagggaagcg 
atacatcatg 
ttttcccaaa 
tagcctaatt 
cagcacctgc 
ttcctcctca 
attcctgtca 
tagataaatg 
ggttagaatg 
taggaatggt 
actttgaggg 
cccatttgca 
gcaagaatgc 
cccccctaat 
tgtaagctac 
gcatgtgcac 
agcacatgag 
accaaggaca 
agtttatgct 
ttgaaatatg 
atctcaaaac 
ttgactcaag 
ataacttgac 
gcagccttac 
tatactgtat 
aaggaaaaga 
ggcgggtgga 
catctctacc 
tacttgggag 
tgagattgcg 
agggaaagaa 
tataaggaag 
ttcatcctcg 
ggccacggca 
ttaaaaaaat 
actgtctttg 
tttctaatgt 
gggccagcac 
tcagggagct 
taacaaaaca 
tgtttcagca 
gaatgtgacc 
ccgggatgtt 
aggaaaggtg 
ttttacaacc 
tggagtagaa 
gtgactctgc 
caaccggctg 
gaggatccca 
tttactgcag 
ctcagtcttg 
tctaatcctt 
ttccacagtg 
ctatcttttt 
aaggagaaaa 
agagtacaga 
ggctctgcca 



aaaagcaaat 
gatgtgaagc 
accggttctg 
agtaagtggc 
gatcttgtgc 
ggtatgtatc 
acttttggag 
ctgttactgt 
ggtagggata 
tatcctgatt 
cggtcataaa 
atgctgacct 
ccagccatct 
gcctttgccc 
agtaggagat 
catgtagcac 
aaggtgttgt 
gaaaagaagg 
ttgaaatagg 
agaccctcca 
agcagcgtca 
taatggcatc 
txcaaatcct 
acacatattt 
ataattgagc 
tttcctgggc 
accaccctgt 
gctaatgctc 
ttttttatat 
caatgcatgg 
tccccaaaaa 
caataagata 
tcttacaata 
ggctgggcat 
tcacttgagg 
aaaaatacaa 
gctgaggcag 
ccactgcact 
agaaaaaaaa 
agaaaatata 
tcatcttcat 
ggagaaaaga 
ttgcatacaa 
gttaaataaa 
gactactaga 
taccacagaa 
ggttctatgc 
ggatttgact 
aacctcggac 
gtagaagatg 
tttggcaagg 
agcatttttt 
catttcttcc 
gggggagcgc 
actaccctgt 
gatcaggtca 
ggcacaaaat 
atatcttcta 
gaaaaggcat 
cacgtgtcag 
acttaataaa 
acagaaaaca 
ctactgtttc 
cagcccggta 
gccatttgcc 
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tgggggtatg 
atacttcctt 
gtttgttttg 
tctgtgttga 
gccctctgct 
ttccctgggc 
caacaggaca 
gagtctaacc 
gtgtatatgg 
tgtgcgagga 
gcaagcctgt 
agaaaagagt 
gtccctgcac 
gcctggccaa 
ttgcacgtgt 
gggaggcgga 
gcgagattcc 
tccatgtgaa 
ttattgccac 
cctacccaca 
ttgtgcacac 
cgccacttcc 
aaatcctggt 
attgccatca 
actacatatc 
tagccattgt 
gtggtggaaa 
tatgccgtca 
cctttatttt 
tgtcatcatc 
gaaggagaac 
ctatattgca 
tatttcggag 
ttagcattct 
accaattcca 
ttatatattc 
tcttaaaaaa 
ggagtcttgc 
tccgtctctc 
tgcgcactac 
ttggccaggc 
gtgctagtat 
tcaatccatg 
catatgtcta 
aaaacaattg 
accttcctaa 
agagctaact 
cttaaagctt 
gattttctat 
tatactttaa 
attatttatg 
gtcagtggct 
gactgcactt 
attagatcag 
gctttaagcc 
tgctactgtc 
aggcacatga 
tgcagacttg 
aaagaaacat 
tagggagaga 
agctc 



ggtgctgtgg 
ggttaaatat 
gttttgattt 
ttgtgccctt 
gagcactgga 
attttttcat 
cagtagacat 
catcaaggga 
cggacacgtg 
acagtcccta 
gtgtctcgat 
caggggatat 
tttgggaggc 
catggtaaaa 
ctgtagtccc 
ggctgaagtg 
atctcaaaaa 
gatgatattt 
tgacaggaga 
gccttcagtc 
acacttctct 
accagaaggc 
agcactttgg 
atctcagcat 
ttttctggac 
caattactct 
gagtgaaaga 
attttgtcca 
tcagaaatat 
ctggctatat 
tccccactga 
ctgtgaccga 
catgaagacc 
ggttttgaca 
agttttaatt 
cgcactcaag 
tcctgggtgg 
tctgttgccc 
gggttcaagc 
cacgccaagc 
tggtcttgaa 
tatgggcgtg 
taggaaagta 
taatatagtg 
gcaaactttg 
tatgctttac 
atatttttat 
ctatggttga 
ttatgtaggt 
ataaaggtga 
tacaatttgg 
tacaacaacg 
cttctcaatg 
ggcagaggga 
catctcctac 
ccaagcaagt 
cggggcaggg 
gagagatttc 
tcttgggatg 
tataagtgga 



gtgacttctg 
tcaggaaaac 
tgctttggta 
gtattagcag 
tacacaaact 
gcttaaattc 
tcgtgagtac 
agggattgag 
tgtgtacatg 
accggaagtg 
ccatgcctta 
aaacgatggc 
ccagacaggc 
gcccatctct 
agctactcag 
agctgagatt 
aaaaaaaaag 
gaacatttta 
ggtttctctt 
attgtcctaa 
gcttccctgg 
cttgctactg 
atctcccact 
cgttttaggc 
tgtgcattat 
gaaacgttca 
aagtcaaatt 
ctgataaatg 
tctacatcat 
ctctacacaa 
atgtttcata 
gaacttttaa 
ctggagttca 
tcagcattag 
tttaacacca 
gagtaaccag 
acttttgaaa 
aggctggagt 
aattgtctgc 
taatttttgt 
ttcctgacct 
aaccaccatg 
aaatggaagg 
tttaggttct 
tattaatgtg 
aatctgcact 
aagactacta 
cattgtatat 
aatattgttc 
ctgggaattg 
tgtttgtatt 
tatctttttc 
ttttctcatt 
aaaacaaaaa 
acttctgctc 
gaccaagcct 
atgtcgtctt 
ttcccattgg 
attgtattga 
atgagatctc 



gaggagtagc 
aaactgcctg 
caaaaaagat 
gtgttttctt 
gtgtttagga 
taattctggg 
ccactgtggg 
tatatcaaat 
catgtgcata 
ctgtgggcct 
cagggaaagt 
ttacgctggg 
aaatcacttg 
actcaaaata 
gaggttgagg 
ggaccactgt 
aaacaacgaa 
aaacacttta 
tacctctggt 
agcctagctc 
ccgttctcta 
caccaactag 
tgcacttagg 
acttctttcc 
tcagtttatt 
ggttttgaca 
gcacaaaaat 
ggatttgagc 
tggagctgtg 
gtgtagaaag 
aaggaagcac 
gaggatagaa 
aaaaactctt 
tcactttgaa 
tggcaccttt 
gtcgtccaag 
agcttttttt 
gcagtagcac 
ctcagcctcc 
attttttagt 
caggtgatcc 
cccagccgaa 
aaattgggtg 
tttttttttc 
ttaagtgcag 
ttaactgact 
tacaaactac 
ataatttttt 
tatttgtata 
ttactgttgt 
agctctacta 
gcttataata 
ctaggatgca 
actggtagaa 
tgtacgtgcc 
gacaatactt 
acagggaaga 
cagtagtttg 
aacaaaatta 
tagagtccat 



tccaccctca 
gaggtttttt 
tttggacatt 
gagcacctgt 
tttagcaaca 
ggtggcttct 
ctgttgccac 
atacccacat 
tgttgggagc 
tcagactctt 
attctgagta 
tgtggtggct 
aggtcaggag 
caaaaagtag 
caggagaatt 
actccagcct 
aaaagaaatg 
aataaactgt 
cctgcacccc 
taattccact 
tcttggagag 
ttactatctc 
gttcaccttc 
agccattgtt 
aaatgcccat 
aattctttcc 
aggatggtgt 
tctccaagtt 
gtatttgtgg 
gcaggagtgg 
tgttggagct 
tacatggaaa 
gatatgacct 
atgtaacgaa 
tgcacataac 
caaaaacaaa 
tttttttttt 
gatctcggct 
cgagtagctg 
agagatgggg 
acccaccttg 
aagcttttga 
catttctagg 
aggaatacat 
gagacattgg 
taagtggcat 
agagtttatg 
aaaaaggttt 
tattgagata 
acttattcta 
cagtaaatga 
cattttggtg 
aaccaatgga 
accggcaacc 
cattgtcact 
tgtctactgg 
gaaaagataa 
actaattgga 
ggtaaaagga 
taaaagcaag 



gggctgggat 
gttgttattt 
tagaaatgtt 
catgtgctaa 
agtcacagat 
ggaccagctg 
agaggctgta 
gcatgcatgt 
tcaggcccat 
gcaggaagct 
ctttcagtga 
cacgcctgta 
tttgggacca 
ctgggtgtgg 
gcttgaacct 
gggtgacaga 
atggcttagc 
tctctcctgt 
tctgagccat 
gcctctcctt 
gcatttcaaa 
ttcttcaccc 
cgttataatc 
cttacctcca 
taaatgtgtt 
taatgtaagt 
aatttggggt 
gactagatgc 
tcatcatcct 
ggcagagctg 
actgcaaatg 
cgcaaatgag 
gttattacca 
tggtactaca 
atgctttaga 
tgggaaaatg 
tttttgagac 
cactgcaccc 
ggattacagg 
tttcaccatc 
gcctcccaaa 

ggggctgact 

acttttctaa 
ttggaaattc 
tattctgggc 
taaacatttg 
atttaaggta 
tctatatggg 
atttatttaa 
tcttccattt 
ctgtaaaatt 
actgtaggct 
gaagccccta 
acagcttcaa 
tctgttcaca 
agtcactgca 
tgctctctac 
gatgagaaaa 
caatatagga 
ctagattgag 
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LOCUS HUMCETP7 894 bp DNA PRI 01 -NOV- 199 4 

DEFINITION Human cholesteryl ester transfer protein (CETP) gene, exons 15 and 
16 . 

ACCESSION M32998 J02898 
NID gl80267 

KEYWORDS cholesteryl ester transfer protein 
SEGMENT 7 of 7 

SOURCE Human DNA. 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 {bases 1 to 894) 

Agellon,L.B. , Quinet,E.M., Gillette, T. G. , Drayna,D.T., Brown, M.L. 
and Tall, A. R. 
Unpublished (1990) 

2 (sites) 

Agellon,L.B. , Quinet,E.M., Gillette, T. G. , Drayna,D.T., Brown, M.L. 
and Tall, A. R. 

Organization of the human cholesteryl ester transfer protein gene 
Biochemistry 29 (6), 1372-1376 (1990) 
90241928 

(2) sites for (1] ; intron/exon boundaries. 
Draft entry and computer-readable sequence for [2] kindly 

by L.B.Agellon, 16-MAR-1990. 

Location/Qualifiers 
1. .894 

/organism="Homo sapiens" 
/db_xref = " taxon : 9606 ■ 
gen® join(M32992 : 388 . . 1656 ,M32993 : 1 . . 3446 , M32994 : 1 . . 628 , 

M32995:l. .399,M32996 : 1. . 409 ,M32997 : 1 . . 1420, 1 . .342) 
/gene="CETP" 

C °S join(M32992:388. .505, M32992.-1408. .1522, M32993:432. .566, 

M32993:654. .724, M32993:954. .1041, M32993:2068. .2137, 
M32993:2355. . 2415 , M32993 : 3023 . . 3114 , M32994 : 166 . . 345, 
M32995:238. . 288 ,M32996 : 128 . . 292 ,M32997 : 375 . .442, 
M32997 : 770. .803, M32997: 1285. .1357,257. .342,523. .597) 
/note=" cholesteryl ester transferase protein precursor* 
/codon_start=l 
/db_xref = - PID : gl802 6 9 " 

/ trans lation="MIAATVLTLALLGNAHACSKGT 

AKVI QTAFQRASYPDITGEKAMMLLGQVKYGLHNI QI SHLS IAS SQVELVEAKS IDVS 

IQWSWFKGTLKYGYTTAWWLGIDQSIDFEIDSAIDLQINTQLTCDSGRVRTDAPDC 

YLSFHKLLLHLQGEREPGWIKQLFTNFI SFTLKLVLKGQICKEINVI SNIMADFVQTR 

AASILSDGDIGVDISLTGDPVITASYLESHHKGHFIYKNVSEDLPLPTFSPTLLGDSR 

MLYFWFSERVFHSLAKVAFQDGRIiMLSLMGDEFKAVLETWGFNTNQEIFQEWGGFP^ 

QAQVTVHCLKMPKI SCQNKGWVNS SVMVKFLF PRPDQQHSVAYTFEEDIVTTVQAS Y 

SKKKLFLSLIJDFQITPKTVSNLTESSSESVQSFLQSMITAVGIPEVMSRLEVVFTAI^ 

NSKGVSLFDI INPEI ITRDGFLLLQMDFGFPEHLLVDFLQSLS " 
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prim__transcript <1 . . 772 

/note="CETP mRNA and introns" 
intron <1..256 

/gene= B CETP" 
/note="CETP intron N" 
mat_peptide 257. .342 

/gene="CETP" 

/note=-cholesteryl ester transferase protein" 
exon 257.. 342 

/gene="CETP B 
/note=-G00-119-773" 
/number=15 
343.. 522 

/note="CETP intron 0" 
523. .>597 

/note="cholesteryl ester transferase Drotein precursor" 
/number=16 
mat_pepti.de 523 . . 594 

/note="cholesteryl ester transferase protein" 
?olyA_signal 756.. 762 
BASE COUNT 178 a 262 c 256 g 

ORIGIN 



intron 



exon 



198 t 



About 950 bp after segment" 6. 
1 ggatgggttg ggagctcaag ttttggggca gaagggaatt ttttttggca gcagagtgca 
61 agccctgccg ccaggcaaac tctgctcttc ctcatcctca gaagcacttc ctcactctgc 
121 taaatcaaag tgaaacgcat gtttacagaa tattggtcca aaagggtctc agcatctccc 
181 actacccagg gtgcagagcc tcgggccggc cttgctcccc aagaagggcc gactggggct 
241 ctgtcccctc gcccagggct cgaggtagtg tttacagccc tcatgaacag caaaggcgtg 
301 agcctcttcg acatcatcaa ccctgagatt atcactcgag atgtgagtac aaagcccccc 
361 tcaccagccc ctgttcctgg ggagagaggc ccagacagga ttcctggggt gactgggggc 
421 tgttggggag acagacagag gggcctctac cagcttggct ccctcctggt ggcctgggag 
481 tcagcccagc tcgcccctct ctcctactgc ccctcccttc agggcttcct gctgctgcag 
541 atggactttg gcttccctga gcacctgctg gtggatttcc tccagagctt gagctagaag 
601 tctccaagga ggtcgggatg gggcttgtag cagaaggcaa gcaccaggct cacagctgga 
661 accctggtgt ctcctccagc gtggtggaag ttgggttagg agtacggaga tggagattgg 
721 ctcccaactc ctccctatcc taaaggccca ctggcattaa agtgctgtac ccaagagctg 
781 cggagtcctt cttctgtggc tggcgggtag aggggggggg aagggattgt ctcaccagtg 
841 ccgtccacct cttttcagcc cttccaagca gctgccccca aaccctccaa gctt 
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LOCUS HUMCILA 1431 bp mRNA PRI 01-NOV-1994 

DEFINITION Human lipoprotein-associated coagulation inhibitor mRNA, complete 
cds . 

ACCESSION J03225 
NID gl80545 

KEYWORDS lipoprotein-associated coagulation inhibitor. 
SOURCE Human placenta, cDNA to mRNA, clone lambda-P9. 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 1431) 

AUTHORS Wun,T.C, Kretzmer , K. K . , Girard,T.J., Miletich, J. P. and Broze,G.J. 
Jr. 

Cloning and characterization of a cDNA coding for the 
lipoprotein-associated coagulation inhibitor shows that it 



TITLE 
consists 



JOURNAL 
MEDLINE 

COMMENT 

by 

FEATURES 

source 



sig_peptide 



signal 

CDS 



of three tandem Kunitz-type inhibitory domains 
J. Biol. Chem. 263 (13), 6001-6004 (1988) 
88198127 

Draft entry and printed copy of sequence for [1] kindly provided 

T. -C.Wun, 19-MAR-1988. 

Location/Qualifiers 
1. .1431 

/organism="Homo sapiens" 
/ db__xr e f = ■ t axon : 9 6 0 6 " 
/map= a 2q31-q32.1" 
133. .216 
/gene="TFPI" 

/note=" lipoprotein-associated coagulation inhibitor 



peptide" 
133. .1047 
/gene="TFPI" 

/note=° lipoprotein-associated coagulation inhibitor 

precursor" 

/codon_start=l 

/db_xref="GDB:G00-127-364" 

/db_xref="PID:gl80546" 

/ trans iation= "MI YTMKKVHALWASVCLLLNLA 

PPLKLMHSFCAFKADDGPCKAIMKRFFFNIFTRQCEEFIYGGCEGNQNRFESLiEECKK 

MCTRDNANRIIKTTLQQEKPDFCFLEEDPGICRGYITRYFYNNQTKQCERFKYGGCLG 

NMNNFETLEECKNICEDGPNGFQVI3NYGTQLNAV1^SLTPQSTKVPSLFEFHGPSWCL 

TPADRGLCRANENRFYYNSVIGKCRPFKYSGCGGNENNFTSKQECLRACKKGFIQRIS 

KGGLIKTKRKRKKQRVKIAYEEI FVKNM " 
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gene 133.. 1047 

/gene="TFPI" 
mat_peptide 217 . . 1044 

/gene= B TFPI" 

B , rr /note= D lipoprotein-associated coagulation inhibitor" 

BASE COUNT 479 a 244 c 267 g 441 t 

ORIGIN 351 bp upstream of Sspl site. 

1 ggcgggtctg cttctaaaag aagaagtaga gaagataaat cctgtcttca atacctggaa 

61 ggaaaaacaa aataacctca actccgtttt gaaaaaaaca ttccaagaac tttcatcaga 

121 gattttactt agatgattta cacaatgaag aaagtacatg cactttgggc ttctgtatgc 

181 ctgctgctta atcttgcccc tgcccctctt aatgctgatt ctgaggaaga tgaagaacac 

241 acaattatca cagatacgga gttgccacca ctgaaactta tgcattcatt ttgtgcattc 

301 aaggcggatg atggcccatg taaagcaatc atgaaaagat ttttcttcaa tattttcact 

361 cgacagtgcg aagaatttat atatggggga tgtgaaggaa atcagaatcg atttgaaagt 

421 ctggaagagt gcaaaaaaat gtgtacaaga gataatgcaa acaggattat aaagacaaca 

481 ttgcaacaag aaaagccaga tttctgcttt ttggaagaag atcctggaat atgtcgaggt 

541 tatattacca ggtattttta taacaatcag acaaaacagt gtgaacgttt caagtatggt 

601 ggatgcctgg gcaatatgaa caattttgag acactggaag aatgcaagaa catttgtgaa 

661 gatggtccga atggtttcca ggtggataat tatggaaccc agctcaatgc tgtgaataac 

721 tccctgactc cgcaatcaac caaggttccc agcctttttg aatttcacgg tccctcatgg 

781 tgtctcactc cagcagacag aggattgtgt cgtgccaatg agaacagatt ctactacaat 

841 tcagtcattg ggaaatgccg cccatttaag tacagtggat gtgggggaaa tgaaaacaat 

901 tttacttcca aacaagaatg tctgagggca tgtaaaaaag gtttcatcca aagaatatca 

961 aaaggaggcc taattaaaac caaaagaaaa agaaagaagc agagagtgaa aatagcatat 

1021 gaagaaattt ttgttaaaaa tatgtgaatt tgttatagca atgtaacatt aattctacta 

1081 aatattttat atgaaatgtt tcactatgat tttctatttt tcttctaaaa tcgttttaat 

1141 taatatgttc attaaatttt ctatgcttat tgtacttgtt atcaacacgt ttgtatcaga 

1201 gttgcttttc taatcttgtt aaattgctta ttctaggtct gtaatttatt aactggctac 

1261 tgggaaatta cttattttct ggatctatct gtattttcat ttaactacaa attatcatac 

1321 taccggctac atcaaatcag tcctttgatt ccatttggtg accatctgtt tgagaatatg 

1381 atcatgtaaa tgattatctc ctttatagcc tgtaaccaga ttaagccccc c 
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DEFINITION 

ACCESSION 

NID 

KEYWORDS 
SOURCE 

ORGANISM 



08-JAN-1995 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 

COMMENT 

acid 



This 



HUMPRC 1366 bp mRNA PRI 

Human protein C, mRNA. 
K02059 
gl90322 

glycoprotein; protease; protein C; serine protease. 
Human liver, cDNA (library of Woo) to mRNA, clones lambda-HC1026 
and lambda-HC1375. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1366) 
Foster, D. and Davie, E.W. 

Characterization of a cDNA coding for human protein C 
Proc. Natl. Acad. Sci. U.S.A. 81 (15), 4766-4770 (1984) 
84272714 

Protein C is a precursor to a serine protease called "activated 
protein C that has a strong anticoagulant activity. The amino 

sequence as determined from the cDNA indicates that protein C is 
synthesized as a single-chain polypeptide containing the light 
chain and the heavy chain connected by a dipeptide of Lys-Arg. 

precursor peptide is then converted to the light and heavy chains 
by cleavage of two or more internal peptide bonds. The amino acid 
sequence of human protein C shows a high homology with that of the 
bovine molecule. Two clones were sequenced in (1] and shown to 
code for human protein C. Clone lambda-HC1026 covers bp 146-1140, 
and clone lambda-HC1375 covers bp 1-1366. The two cDNA clones had 
a poly-A tail at different positions; both poly-A sites were 
preceded by poly-A signals (1). 
Location/Qualifiers 
1. .1366 

/organism= "Homo sapiens" 
/db_xref = " taxon : 9606 " 
/ tissue_type= "liver " 
/tissue_lib="of Woo" 
/map="2al3-q21" 
<1. .1366 
/gene="PROC" 
/note= n G00-120-317° 
<1. .1140 
/gene="PROC" 
/note="G00-120-317" 
1..1366 
/ gene =" PROC" 
<1. .277 
/ gene =" PROC" 
/note="G00-120-317" 
/product= "protein C light chain" 
<1..1073 
/gene=°PROC" 
/note=" . ■ 
/ codon_s tar t=2 
/db_xref="GDB:G00-120-317" 
/products "protein C" 
/db_xref= • PID : g!90323 ■ 

/ translations "QGHGTCIDGIGSFSCDCRSGWEGRFCQREVSFLNCSLDNGGCTH 
YCLEEVGWRRC SCAPGYKLGDDLLQCHPA VKF PCGRFWKRMEKKRSHLKRDTEDQEDQ 
TOPRLIDGKMTRRGDSPWQVV^LDSKKKIACGAVIilHPSWvX 
GEYDU^RWEKWEI^LDIKEVTvlIPNYSKSTTDNDIALLHI^ 

FIG. 11A 
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GLAERELNQAGQETLVTGWGYHS SREKEAKRNRTFVLNFI KI PWPHNEC S EVMSNMV 

SENMU^GIUSDRQDACEGDSGGPMVASFHGTWFLVGLVSWGEGCGLUINYGVYTKVS 
RYLDWIHGH I RDKEAPQKSWAP - 
284. .1069 
/gene="PROC" 
/note="G00-120-317" 
/product= -protein C heavy chain" 
320.. 1069 
/gene="PROC" 
/note="G00-120-317" 
/product = "protein C 
BASE COUNT 302 a 388 c 425 g 

ORIGIN 



mat_peptide 



mat_peptide 



activated heavy chain" 
3 251 t 

207 bp upstream of PstI site; chromosome 2ql4-q21. 
ccaagggcac ggcacgtgca tcgacggcat cggcagcttc agctgcgact 
ctgggagggc cgcttctgcc agcgcgaggt gagcttcctc aattgctctc 
cggctgcacg cattactgcc tagaggaggt gggctggcgg cgctgtagct 
ctacaagctg ggggacgacc tcctgcagtg tcaccccgca gtgaagttcc 
241 gccctggaag cggatggaga agaagcgcag tcacctgaaa cgagacacag 
301 agaccaagta gatccgcggc tcattgatgg gaagatgacc aggcggggag 
361 gcaggtggtc ctgctggact caaagaagaa gctggcctgc ggggcagtgc 
421 ctcctgggtg ctgacagcgg cccactgcat ggacgagtcc aagaagctcc 
481 tggagagtat gacctgcggc gctgggagaa gtgggagctg gacctggaca 



1 
61 
121 
181 



541 cttcgtccac cccaactaca gcaagagcac caccgacaat gacatcgcac 



601 
661 



ggcccagccc gccaccctct cgcagaccat agtgcccatc tgcctcccgg 

tgcagagcgc gagctcaatc aggccggcca ggagaccctc gtgacgggct 

721 cagcagccga gagaaggagg ccaagagaaa ccgcaccttc gtcctcaact 

781 tcccgtggtc ccgcacaatg agtgcagcga ggtcatgagc aacatggtgt 

841 gctgtgtgcg ggcatcctcg gggaccggca ggatgcctgc gagggcgaca 

901 catggtcgcc tccttccacg gcacctggtt cctggtgggc ctggtgagct 

nnon ctgt 99gctc cttcacaact acggcgttta caccaaagtc agccgctacc 

nSon cca tgggcac atcagagaca aggaagcccc ccagaagagc tgggcacctt 

1081 cctgcagggc tgggcttttg catggcaatg gatgggacat taaagggaca 

1141 acaccggcct gctgttctgt ccttccatcc ctcttttggg ctcttctgga 

1201 atttactgag cacctgttgt atgtcacatg ccttatgaat agaatcttaa 

1261 aactctgtcg ggtggggagg agcagatcca agttttgcgg ggtctaaagc 

1321 gagggggata ctctgtttat gaaaaagaat aaaaaacaca accacg 



gccgcagcgg 
tggacaacgg 
gtgcgcctgg 
cttgtgggag 
aagaccaaga 
acagcccctg 
tcatccaccc 
ttgtcaggct 
tcaaggaggt 
tgctgcacct 
acagcggcct 
ggggctacca 
tcatcaagat 
ctgagaacat 
gtggggggcc 
ggggtgaggg 
tcgactggat 
agcgaccctc 
tgtaacaagc 
gggaagtaac 
ctcctagagc 
tgtgtgtgtt 
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30-NOV-1994 
exon 2. 



LOCUS HUMLDLR02 144 bp DNA PRI 

DEFINITION Human low density lipoprotein receptor aene, 
ACCESSION L00336 K02573 
NID gl87078 

KEYWORDS low density lipoprotein receptor-1; repeat region. 
SEGMENT 2 of 18 

SOURCE Human DNA (2) and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 

[1]. 

Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 138) 

Yamamoto,T., Davis, C.G., Brown, M.S., Schneider , W.J . , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 23; 132 to 144) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer -readable sequence for [1) kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
1. .144 

/organism="Homo sapiens" 
/db_xref=°taxon: 9606" 
/map="19pl3.3" 
<1. .15 
/gene=-LDLR- 
/note=-LDL intron A" 
16. .138 
/gene="LDLR" 
/note="G00-119-362 " 
/number =2 
139. .>144 
/gene="LDLR" 
/note="LDL intron B" 
33 a 33 c 46 g 32 t 

Chromosome 19pl3 . 2-pl3 . 1 ; about 10 kb after segment 1. 
1 tttcctctct ctcagtgggc gacagatgtg aaagaaacga gttccagtgc caagacggga 
61 aatgcatctc ctacaagtgg gtctgcgatg gcagcgctga gtgccaggat ggctctgatg 
121 agtcccagga gacgtgctgt gagt 
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LOCUS HUMLDLR04 402 bp DNA PRI 30-NOV-1994 

DEFINITION Human low density lipoprotein receptor gene, exon 4. 
ACCESSION L00338 K02573 
NID gl87080 

KEYWORDS low density lipoprotein receptor-1; repeat region. 
SEGMENT 4 of 18 

SOURCE Human DNA [2] and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 

[11 - 

Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini ; Hominidae; Homo. 

1 (bases 16 to 396) 

Yamamoto,T., Davis, C.G.. Brown, M.S., Schneider, W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1) , 27-38 (1984) 
85024898 

2 (bases 1 to 23; 389 to 402) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer-readable sequence for [1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
1. .402 

/organism= "Homo sapiens ■ 
/db_xref = B taxon: 9606 " 
/map=U9pl3.3- 
<1. .15 
/gene="LDLR a 
/note= n LDL intron C" 
16.:396 
/gene="LDLR" 
/note="G00-119-362- 
/ number =4 
397. .>402 
/gene="LDLR* 
/note="LDL intron .D" 
73 a 131 c 120 g 78 t 

Chromosome 19pl3 .2-pl3.1; about 2.4 kb after segment 3. 
1 catccatccc tgcagccccc aagacgtgct cccaggacga gtttcgctgc cacgatggga 
61 agtgcatctc tcggcagttc gtctgtgact cagaccggga ctgcttggac ggctcagacg 
121 aggcctcctg cccggtgctc acctgtggtc ccgccagctt ccagtgcaac agctccacct 
181 gcatccccca gctgtgggcc tgcgacaacg accccgactg cgaagatggc tcggatgagt 
241 ggccgcagcg ctgtaggggt ctttacgtgt tccaagggga cagtagcccc tgctcggcct 
301 tcgagttcca ctgcctaagt ggcgagtgca tccactccag ctggcgctgt gatggtggcc 
361 ccgactgcaa ggacaaatct gacgaggaaa actgcggtat gg 
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TITLE 
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1 
61 
121 
181 



HUMLDLR09 193 bp DNA PRI 30-NOV-1994 

Human low density lipoprotein receptor gene, exon 9. 
L00343 K02573 
gl87085 

low density lipoprotein receptor-1; repeat region. 
9 of 18 

Human DNA (2] and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebra ta; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 187) 

Yamamoto,T. f Davis, C.G., Brown, M.S., Schneider , W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
seauences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 23; 180 to 193) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer- readable sequence for [1] kindly provided 
by D.Russell, 01 -MAR- 19 85 . 

Location/Qualifiers 
1. .193 

/organism="Homo sapiens" 
/ db__xr e f = • t axon : 9 6 0 6 ° 
/map="19pl3.3" 
<1. .15 
/gene="LDLR n 
/note="LDL intron H" 
16. .187 
/gene=-LDLR* 
/note="G00-119-362" 
/ number =9 
188..>193 
/gene="LDLR" 
/note=°LDL intron 1° 
44 a 64 c 52 g 33 t 

Chromosome 19pl3 . 2-pl3 . 1 ; about 1.2 kb after segment 8. 
tccccggacc cccaggctcc atcgcctacc tcttcttcac caaccggcac gaggtcagga 
agatgacgct ggaccggagc gagtacacca gcctcatccc caacctgagg aacgtggtcg 
ctctggacac ggaggtggcc agcaatagaa tctactggtc tgacctgtcc cagagaatga 
tctgcaggtg age 
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HUMLDLR10 249 bp DNA pri 30-NOV-1994 

Human low density lipoprotein receptor gene, exon 10. 
L00344 K02573 
gl87086 

low density lipoprotein receptor-1; repeat region. 
10 of 18 

Human DNA [2) and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebra ta; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 243) 

Yamamoto,T., Davis, C.G., Brown,M.S., Schneider , W.J. , Casey,M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 23; 236 to 249) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with differenc 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer -readable sequence for [1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
1. .249 

/organism^Homo sapiens" 
/db_xref = " taxon : 9606 ■ 
/map="19pl3.3" 
<1. .15 
/gene="LDLR" 
/note="LDL intron r 
16. .243 
/gene^LDLR" 
/note="G00-119-362" 
/number=10 
244..>249 
/gene="LDLR" 
/note="LDL intron J" 
51 a 77 c 71 g 50 t 

Chromosome 19pl3 . 2-pl3 . 1 ; about 900 bp after segment 9. 
1 ctcctcctgc ctcagcaccc agcttgacag agcccacggc gtctcttcct atgacaccgt 
61 catcagcagg gacatccagg cccccgacgg gctggctgtg gactggatcc acagcaacat 
121 ctactggacc gactctgtcc tgggcactgt ctctgttgcg gataccaagg gcgtgaagag 
181 gaaaacgtta ttcagggaga acggctccaa gccaagggcc atcgtggtgg atcctgttca 
z4l tgggcgcgt 



intron 



exon 



ir.cron 
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HUMLDLR11 140 bp DNA PRI 30-NOV-1994 

^OwJs 1 ^^? 81 ^ lipoprot:ein rec eptor gene, exon 11. 
gl87087 

ll W of e i8 lty lipoprotein rece Ptor-l ; repeat region. 
Human DNA [2] and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 6 to 134) 

Yamamoto,T., Davis, C.G., Brown, M.S., Schneider , W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 22; 128 to 140) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer- readable, sequence for (1) kindly provided 
by D.Russell. 01-MAR-1985. 

Location/Qualifiers 
1. .140 

/organism= -Homo sapiens " 
/db_xr e f = " t axon : 9 6 0 6 - 
/map= a 19pl3.3" 
<1. .15 
/gene=-LDLR- 
/note="LDL intron J" 
16. .134 
/gene="LDLR» 
/note=*G00-119-362" 
/number =11 
135..>140 
/gene="LDLR" 
/note="LDL intron K" 
34 a 38 c 37 g 31 t 

Chromosome 19pl3 . 2-pl3 . 1 ; about 2.6 kb afcer segment 10. 
1 ctgtcctccc accagcttca tgtactggac tgactgggga actcccgcca agatcaagaa 
61 agggggcctg aatggtgtgg acatctactc gctggtgact gaaaacattc agtggcccaa 
121 tggcatcacc ctaggtatgt 



FIG. 16 
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HUMLDLR1 3 163 bp DNA PRI 30-NOV-1994 

Human low density lipoprotein receptor gene, exon 13. 
L00347 K02573 
gl87089 

low density lipoprotein receptor-1; repeat region. 
13 of 18 

Human DNA (2) and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 157) 

Yamamoto,T., Davis, C.G., Brown, M.S., Schneider , W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 24; 151 to 163) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer -readable sequence for [1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
1. .163 

/organism="Homo sapiens" 
/db_xref="taxon:9606" 
/map=-19pl3.3- 
<1. .15 
/ gene= " LDLR " 
/note='LDL intron L" 
16. .157 
/gene="LDLR n 
/note="G00-119-362" 
/number=13 
158..>163 
/gene="LDLR" 
/note="LDL intron M" 
43 a 45 c 34 g 41 t 

Chromosome 19pl3 . 2-pl3 . 1 ; about 3 kb after segment 12. 
1 ttgctgcctg tttaggacaa agtattttgg acagatatca tcaacgaagc cattttcagt 
61 gccaaccgcc tcacaggttc cgatgtcaac ttgttggctg aaaacctact gtccccagag 
121 gatatggtcc tcttccacaa cctcacccag ccaagaggta agg 



intron 



exon 



intron 



FIG. 17 
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HUMLDLR1 5 192 bp DNA pri 30-NOV-1994 

Human low density lipoprotein receptor gene, exon 15. 
L00349 K02573 
gl87091 

low density lipoprotein receptor-1; repeat region. 
15 of 18 

Human DNA [2] and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
CD. 

Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 186) 

Yamamoto,T., Davis, C.G., Brown, M.S., Schneider , W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 23; 179 to 192) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer -readable sequence for [1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
source 1 . .192 

/organism= "Homo sapiens" 
/db_xref = - taxon : 9606 " 
/map="19pl3.3° 
intron <1..15 

/gene=-LDLR- 
/note="LDL intron N" 
exon 16.. 186 

/ gene= "LDLR" 
/note="G00-119-362" 
/ number =15 
intron 187..>192 

/gene="LDLR" 
/note="LDL intron 0" 
BASE COUNT 46 a 64 c 49 g 33 t 

ORIGIN Chromosome 19pl3 . 2-pl3 . 1 ; about 2.8 kb after segment 14. 

1 tatttattct ttcagaggct gaggctgcag tggccaccca ggagacatcc accgtcaggc 
61 taaaggtcag ctccacagcc gtaaggacac agcacacaac cacccggcct gttcccgaca 
121 cctcccggct gcctggggcc acccctgggc tcaccacggt ggagatagtg acaatgtctc 
181 accaaggtaa ag 



LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
COMMENT 

FEATURES 



FIG. 18 



SUBSTITUTE SHEET (RULE 26) 



WO 99/50454 



PCTAJS99/06473 



36/97 



LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 

SEGMENT 

SOURCE 

ORGANISM 



HUMLDLR17 179 bp DNA PRI 30-NOV-1994 

Human low density lipoprotein receptor gene, exon 17. 
L00351 K02573 
g!87093 

low density lipoprotein receptor-1; repeat region. 
17 of 18 

Human DNA (3) and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 16 to 173) 

Yamamoco,T., Davis, C.G., Brown, M.S., Schneider , W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 57 to 101) 
Lehrman,M.A. , Goldstein, J 
Schneider , W.J. 

Internalization-defective LDL receptors produced by genes with 
nonsense and frameshift mutations that truncate the cytoplasmic 
domain 

Cell 41 (3), 735-743 
85228224 

3 (bases 1 
Sudhof ,T.C. , 
The LDL receptor gene: 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Draft entry and computer -readable sequence for (1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location/Qualifiers 
1. .179 

/organism="Homo sapiens" 
/ db_xr e f = ■ t ax on : 9 6 0 6 " 
/map="19pl3.3" 
<1. .15 

/gene=-LDLR" 
/note="LDL intron P" 
16. .173 
/gene="LDLR". 
/note="G00-119-362" 
/number=17 
76. .77 
/gene=°LDLR- 

/note="ac in wt; aagaac in internalization-defective 
familial hypercholesterolemia [2]" 
174..>179 
/gene="LDLR" 
/note="LDL intron Q" 
BASE COUNT 42 a 56 c 39 g 42 t 

ORIGIN Chromosome 19pl3.2-pl3.1; about 1.4 kb after segment 16. 

1 tgcctctccc tacagtgctc ctcgtcttcc tttgcctggg ggtcttcctt ctatggaaga 
61 actggcggct taagaacatc aacagcatca actttgacaa ccccgtctat cagaagacca 
121 cagaggatga ggtccacatt tgccacaacc aggacggcta cagctacccc tcggtgagt 
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a mosaic of exons shared with different 



exon 



mutation 



mtron 



FIG. 19 



SUBSTITUTE SHEET (RULE 26) 



WO99/50454 



PCT/US99/06473 



37/97 



LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 

SEGMENT 

SOURCE 



30-NOV-1994 
exon 1. 



HUMLDLR01 769 bp DNA PRI 

Human low density lipoprotein receptor gene, 
L29401 K02573 M10664 N00033 
g460288 

low density lipoprotein receptor-1; repeat region. 
1 of 18 

Human DNA [2] and fetal adrenal gland, cDNA to mRNA, clone pLDLR-2 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (sites) 

Yamamoto,T., Davis, C.G., Brown, M.S., Schneider, W.J. , Casey, M.L., 
Goldstein, J. L. and Russell, D.W. 

The human LDL receptor: a cysteine-rich protein with multiple Alu 
sequences in its mRNA 
Cell 39 (1), 27-38 (1984) 
85024898 

2 (bases 1 to 769) 

Sudhof,T.C, Goldstein, J. L. , Brown, M.S. and Russell, D.W. 
The LDL receptor gene: a mosaic of exons shared with different 
proteins 

Science 228 (4701), 815-822 (1985) 
85218750 

Bases 1-769 from Science 228, 815-822 (1985) 
Bases 675-754 from Cell 39, 27-38 (1984) 

Draft entry and computer- readable sequence for (1] kindly provided 
by D.Russell, 01-MAR-1985. 

Location /Qualifiers 
1. .769 

/organism= "Homo sapiens" 
/db_xref="taxon: 9606" 
/map= w 19pl3.3" 
595. .754 
/gene="LDLR" 

/note="low density lipoprotein receptor; G00-119-362" 
/number =1 
688. .750 
/gene="LDLR" 

/note="low density lipoprotein receDtor sicmal peDt" 
755..>769 - - - 

/gene="LDLR" 
/note="LDL intron A" 
BASE COUNT 220 a 169 c 194 g 186 t 

ORIGIN Chromosome 19pl3 ,2-pl3 . 1; 1 bp upstream of BamHI site. 

1 ggatcccaca aaacaaaaaa tatttttttg gctgtacttt tgtgaagatt ttatttaaat 
61 tcctgattga tcagtgtcta ttaggtgatt tggaataaca atgtaaaaac aatatacaac 
121 gaaaggaagc taaaaatcta tacacaattc ctagaaagga aaaggcaaat atagaaagtg 
181 gcggaagttc ccaacatttt tagtgttttc cttttgaggc agagaggaca atggcattag 
241 gctattggag gatcttgaaa ggctgttgtt atccttctgt ggacaacaac agcaaaatgt 
301 taacagttaa acatcgagaa atttcaggag gatctttcag aagatgcgtt tccaattttg 
361 agggggcgtc agctcttcac cggagaccca aatacaacaa atcaagtcgc ctgccctggc 
421 gacactttcg aaggactgga gtgggaatca gagcttcacg ggttaaaagc cgatgtcaca 
481 tcggccgttc gaaactcctc ctcttgcagt gaggtgaaga catttgaaaa tcaccccact 
541 gcaaactcct ccccctgcta gaaacctcac attgaaatgc tgtaaatgac gtgggccccg 
601 agtgcaatcg cgggaagcca gggtttccag ctaggacaca gcaggtcgtg atccgggtcg 
661 ggacactgcc tggcagaggc tgcgagcatg gggccctggg gctggaaatt gcgctggacc 
721 gtcgccttgc tcctcgccgc ggcggggact gcaggtaagg cttgctcca 
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LOCUS HUMF511 279 bp DNA PRI 

DEFINITION Human coagulation factor V gene, exon 11. 
ACCESSION L32765 J05368 
NID g488094 

KEYWORDS coagulation factor V; factor V. 
SEGMENT 11 of 25 

SOURCE Homo sapiens DNA. 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 279) 

AUTHORS Kane,W.H. and Davie, E.W. 

TITLE Cloning of a cDNA coding for human factor V, a blood coagulation 

factor homologous to factor VIII and ceruloplasmin 
JOURNAL Proc. Natl. Acad. Sci . U.S.A. 83 (18), 6800-6804 (1986) 
MEDLINE 86313665 
REFERENCE 2 (bases 1 to 279) 

AUTHORS Kane,W.H., Ichinose,A., Hagen,F.S. and Davie, E.W. 

TITLE Cloning of cDNAs coding for the heavy chain region and connecting 

region of human factor V, a blood coagulation factor with four 

types of internal repeats 
JOURNAL Biochemistry 26 (20), 6508-6514 (1987) 
MEDLINE 88107560 
REFERENCE 3 (bases 1 to 279) 

AUTHORS Jenny, R. J., Pittman, D.D. , Toole, J. J. , Kriz,R.W., Aldape,R.A., 

Hewick,R.M., Kaufman, R.J. and Mann, K.G. 
TITLE Complete cDNA and derived amino acid sequence of human factor V 
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 84 (14), 4846-4850 (1987) 
MEDLINE 87260886 
REFERENCE 4 (bases 1 to 279) 

AUTHORS Cripe,L.D., Moore, K.D. and Kane, W.H. 
TITLE Structure of the gene for human coagulation factor V 
JOURNAL Biochemistry 31 (15), 3777-3785 (1992) 
MEDLINE 92232668 
REFERENCE 5 (bases 1 to 279) 

AUTHORS Shen,N.L., Fan,S.T., Pyati,J., Graff, R. , LaPolla,R.J. and 

Edgington,T.S. 

TITLE The serine protease cofactor factor V is synthesized by 
lymphocytes 

JOURNAL J. Immunol. 150 (7), 2992-3001 (1993) 
MEDLINE 93203619 
FEATURES Location/Qualifiers 
source 1 . . 279 

/organism= "Homo sapiens" 
/ cU>_xr e f = ■ t axon : 9 6 0 6 ■ 
/ 1 issue_type= " placenta ■ 
/cell_type=" fibroblast" 
/map=-lq21-q25" 
intron order (L32764 : 277 . .>319,<1. .74) 

/gene="F5" 

/note="3.1 kb gap; G00-119-896" 
/number =10 
exon 7 5.. 225 

/gene="F5 B 

/note="G00-119-896" 

/number=ll 

52 c 61 g 



73 a 



93 t 



BASE COUNT 
ORIGIN 

1 tctgagttct ctattctgtt ccattggtct atgcgtctgt tcttgtacca gtactatact 

61 gttttgtcct ccagagggca gcagacatcg aacagcaggc tgtgtttgct gtgtttgatg 

121 agaacaaaag ctggtacctt gaggacaaca tcaacaagtt ttgtgaaaat cctgatgagg 

181 tgaaacgtga tgaccccaag ttttatgaat caaacatcat gagcagtaao tcagagtact 

241 atttttgttc atcagttttt cattcctgtg gctgaaata 
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TITLE 
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COMMENT 



low 



HUMHMGCOA 2904 bp mRNA pri 08-NOV-1994 

Human 3-hydroxy-3-methylglutaryl coenzyme A reductase mRNA, 
complete cds. 
M11058 
gl84243 

3-hydroxy-3-methylglutaryl coenzyme A reductase; glycoprotein. 
Human fetal adrenal gland, cDNA to mRNA, library of T.Maniatis, 
clone pHRed-102. 
Homo sapiens 

Euxaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 2904) 
Luskey,K.L. and Stevens, B. 

Human 3-hydroxy-3-methylglutaryl coenzyme A reductase. Conserved 
domains responsible for catalytic activity and sterol -regulated 
degradation 

J. Biol. Chem. 260 (18), 10271-10277 (1985) 
85261451 

Draft entry and sequence in computer readable form for (13 kindly 
provided by K.L.Luskey, 16-JAN-1986. 

HMG-CoA reductase is the rate-limiting enzyme for cholesterol 
synthesis and is regulated via a negative feedback mechanism 
mediated by sterols and non-sterol metabolites derived from 
mevalonate, the product of the reaction catalyzed by reductase. 
Normally in mammalian cells this enzyme is suppressed by 
cholesterol derived from the internalization and degradation of 

density lipoprotein (LDL) via the LDL receptor. Competitive 
inhibitors of the reductase induce the expression of LDL receptors 
in the liver, which in turn increases the catabolism of plasma LDL 
and lowers the plasma concentration of cholesterol, an important 
determinant of atherosclerosis. 

The sequence coding for the highly conserved membrane bound region 
of the protein is located at positions 51-1067, that coding for 

linker part of the protein at positions 1068-1397 and for the 
strongly conserved water-soluble catalytic part at positions 
1398-2714. 

Location/Qualifiers 
1. .2904 

/origanism= B Homo sapiens" 
/db_xr ef = » t axon : 9 6 0 6 • 
/map="5ql3.3-ql4" 
<1. .>2904 

/note="HMG CoA mRNA" 
51. .2717 
/gene=°HMGCR" 
51. .2717 
/gene="HMGCR" 

/note="3-hydroxy-3-methylglutaryl coenzyme A reductase" 

/codon_start=l 

/db_xref="GDB:G00-119-312" 

/db_xr ef = " PID : g3 0 6 8 6 5 " 

/ 1 ran si a t i on= ■ MLSRLFRMHGLFVASHPWEVT VGTVTLTICMMSMNMFTGNNKIC 

GWNYECPKFEEDVLSSDIIILTITRCIAILYIYFQFQNLRQLGSKYILGIAGLFTIFS 

SFVTSTWIHFLDKELTGLNEALPFFLLLIDLSRASTIJtfCFALSSNSQ 

MAIIXPTFTLDALVTSCLVIGVGTMSGTOQLEIMCCFGCM^ 



the 



FEATURES 

source 



mRNA 
gene 
CDS 



VLELSRESREGRPIWQLSHFARVXEEEENKPNPVTQRVKMIMSIXSLVLVHAHSRWI^ 
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PSPQNSTAOTSKVSLGLDENVSKRIEPSVSLWQFYLSKMI SMDIEQVITLSLALLLAV 
KYIFFEQTETESTLSLKNPITSPVVTQKKTODNCCRREPMLVRNNQKCDSVEEETGIN 
RERKVEVIKPLVAETiyTPNRATFWGNSSLLDTSSVLVTQEPEIELPREPRPNEECLQ 
I LGNAEKGAKFLSDAE 1 1 QLVNAKH I PAYKLETLMETHERGVS I RRQLLSKKLS EPS S 
LQYLPYRDYNYSLVMGACCENVIGYMPIPVGVAGPLCLDEKEFQVPMATTEGCLVAST 
NRGCRAIGLGGGASSRVLAIX3MTRGPVVRLPRACDSAEVKAWLETSEGFAVIKEAFDS 
TSRFARI^KIiHTSIAGRNLYIRFQSRSGDAMGMNMISKGTEKALSKI^EYFPEMQILA 
VSGNYCTDKKPAAINWI EGRGKSWCEAVI PAKVVREVLKTTTEAMIEVNINKNLVGS 
AMAGSIGGYNAHAANI VTAI YI ACGQDAAQNVGS SNC ITLMEASGPTNEDLY I SCTMP 

S I EIGTVGGGTNIiPQQACLQMLGVQGACKDNPGENARQL^ 

LAAGHLVKSHMIHNRSKINLQDLQGACTKKTA " 
BASE COUNT 822 a 597 c 678 g 807 t 



ORIGIN 

6 
12 
18 
24 
30 
36 
42 
48 
54 
60 
66 
72 
78 
84 
90 
96 
102 
108 
114 
120 
126 
132 
138 
144 
150 
156 
162 
168 
174 
180 
186 
192 
198 
204 
210 
216 
222 
228 
234 
240 
246 
252 



27 bp upstream of BamHI site; chromosome 5ql3.3-ql4. 
ttcggtggcc tctagtgaga tctggaggat ccaaggattc tgtagctaca 
gactttttcg aatgcatggc ctctttgtgg cctcccatcc ctgggaagtc 
cagtgacact gaccatctgc atgatgtcca tgaacatgtt tactggtaac 
gtggttggaa ttatgaatgt ccaaagtttg aagaggatgt tttgagcagt 
ttctgacaat aacacgatgc atagccatcc tgtatattta cttccagttc 
gtcaacttgg atcaaaatat attttgggta ttgctggcct tttcacaatt 
ttgtattcag tacagttgtc attcacttct tagacaaaga attgacaggc 
ctttgccctt tttcctactt ttgattgacc tttccagagc aagcacatta 
ccctcagttc caactcacag gatgaagtaa gggaaaatat tgctcgtgga 
taggtcctac gtttaccctc gatgctcttg ttgaatgtct tgtgattgga 
tgtcaggggt acgtcagctt gaaattatgt gctgctttgg ctgcatgtca 
actacttcgt gttcatgact ttcttcccag cttgtgtgtc cttggtatta 
gggaaagccg cgagggtcgt ccaatttggc agctcagcca ttttgcccga 
aagaagaaaa taagccgaat cctgtaactc agagggtcaa gatgattatg 
tggttcttgt tcatgctcac agtcgctgga tagctgatcc ttctcctcaa 
cagatacttc taaggtttca ttaggactgg atgaaaatgt gtccaagaga 
gtgtttccct ctggcagttt tatctctcta aaatgatcag catggatatt 
ttaccctaag tttagctctc cttctggctg tcaagtacat cttctttgaa 
cagaatctac actctcatta aaaaacccta tcacatctcc tgtagtgaca 
tcccagacaa ttgttgtaga cgtgaaccta tgctggtcag aaataaccag 
cagtagagga agagacaggg ataaaccgag aaagaaaagt tgaggttata 
tggctgaaac agatacccca aacagagcta catttgtggt tggtaactcc 
atacttcatc agtactggtg acacaggaac ctgaaattga acttcccagg 
ctaatgaaga atgtctacag atacttggga atgcagagaa aggtgcaaaa 
atgctgagat catccagtta gtcaatgcta agcatatccc agcctacaag 
tgatggaaac tcatgagcgt ggtgtatcta ttcgccgaca gttactttcc 
cagaaccttc ttctctccag tacctacctt acagggatta taattactcc 
gagcttgttg tgagaatgtt attggatata tgcccatccc tgttggagtg 
tttgcttaga tgaaaaagaa tttcaggttc caatggcaac aacagaaggt 
ccagcaccaa tagaggctgc agagcaatag gtcttggtgg aggtgccagc 
ttgcagatgg gatgactcgt ggcccagttg tgcgtcttcc acgtgcttgt 
aagtgaaagc ctggctcgaa acatctgaag ggttcgcagt gataaaggag 
gcactagcag atttgcacgt ctacagaaac ttcatacaag tatagctgga 
atatccgttt ccagtccagg tcaggggatg ccatggggat gaacatgatt 
cagagaaagc actttcaaaa cttcacgagt atttccctga aatgcagatt 
gtggtaacta ttgtactgac aagaaacctg ctgctataaa ttggatagag 
aatctgttgt ttgtgaagct gtcattccag ccaaggttgt cagagaagta 
ccacagaggc tatgattgag gtcaacatta acaagaattt agtgggctct 
ggagcatagg aggctacaac gcccatgcag caaacattgt caccgccatc 
gtggacagga tgcagcacag aatgttggta gttcaaactg tattacttta 
gtggtcccac aaatgaagat ttatatatca gctgcaccat gccatctata 
cggtgggtgg tgggaccaac ctactacctc agcaagcctg tttgcagatg 
aaggagcatg caaagataat cctggggaaa atgcccggca gcttgcccga 

FIG. 22B 



atgttgtcaa 
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cagaatttac 
ttctcaagtt 
ttgaatgaag 
gcaaagtttg 
atggcaattt 
gttggtacca 
gttcttgcca 
gagctttctc 
gttttagaag 
tctctaggct 
aacagtacag 
attgaaccaa 
gaacaagtta 
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ggaccgtaat 
aaagtcacat 
ccaagaagac 
aaggactaac 
ataaatgtga 
ctttccatgc 



ggctggggaa 
gattcacaac 
agcctgaata 
ataaaatctg 
tcactgagac 
agactcctca 
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ttgtcactta 
aggtcgaaga 
gcccgacagt 
tgaattaaaa 
agccacttgg 
gate 



tggcagcatt 
tcaatttaca 
tctgaactgg 
aagctcaatg 
tttttggctc 



ggcagcagga 
agacctccaa 
aacatgggca 
cattgtcctg 
tttcagagag 



catcttgtca 
ggagcttgea 
ttgggttcta 
tggaggatga 
gtctcaggtt 
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LOCUS HUMPRCA 11725 bp DNA PRI 08-JAN-1995 

DEFINITION Human protein C gene, complete cds. 
ACCESSION M11228 
NID gl90333 

KEYWORDS glycoprotein; protease; protein C; serine protease. 
SOURCE Human DNA, clones PC -lambda -8 and PC-lamda-6 . 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 {bases 1 to 11725) 

Foster, D.C., Yoshitake,S. and Davie, E.W. 
The nucleotide sequence of the gene for human protein C 
Proc. Natl. Acad. Sci. U.S.A. 82 (14), 4673-4677 (1985) 
85270390 

Location/Qualifiers 
1. .11725 

/organisms "Homo sapiens* 
/ db_xr e f = ■ t ax o n : 9 6 0 6 ■ 
/map="2ql3-q21' 
2131. .2200 
/gene="PROC" 
exon <2131..2200 
/gene="PROC w 

/note=" Protein C; G00-120-317" 
/ number =1 

sig_j?eptide join (2131 .. 2200, 3464 .. 3519) 

/note=" Protein C signal peptide" 
CDS join<2131. .2200,3464. .3630,5093. .5117,5210. .5347, 

5450. .5584,8253. .8395,9269. .9386,10516. .11105) 

/note=° Protein C 

/codon_start=l 

/db_xref="PID:gl90334" 

/ translation= "MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRAN 
SFLEELRHSSLERECIEEICDFEEAKEIFQNTODTLAFWSKHVDGDQCLVLPLEHPCA 
SLCCGHGTCIDGIGSFSCIX^RSGWEGRFCQREVSFLNCSLJ3NGGCTHYCLEEVGWRRC 
SCAPGYKLGDDLLQCHPAVTCFPCGRPWKRMEKKRSHIJCRDTEDQEDQTOPRLIDGKOT 
RRGDSPWQWLLJ3SKKKIJ\CGAVLIHPSWM 

ELDLDI KEVFVHPNYSKSTTDNDIALLHLAQPATLSQT IVPICLPDSGLAERELNQAG 
QETLVTGWG YHS SREKEAKRNRTFVLNF I K I PW PHNEC S EVMSNMVS ENMLCAG I LG 

DRQDACEGDSGGPMVASFHGTWFLVGLVSWGEGCGLliiNYGvYrKVSRYIjDWIHGHIR 

DKEAPQKSWAP " 
intron 2201.. 3463 

/note="ProC cds intron A" 
exon 3464.. 3630 

/numbers 2 

mat_peptide join (3520. .3630,5093. .5117,5210. .5347,5450. 

8253. .8395,9269. .9386,10516. .11102) 
intron 3631. .5092 

/notes* ProC cds intron B" 
exon 5093.. 5117 

/number= 3 
intron 5118.. 5209 

/notes- ProC cds intron C" 
exon 5210.. 5347 

/number= 4 
intron 5348.. 5449 

FIG. 23A 
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exon 



intron 



exon 



intron 



exon 



/note="ProC cds intron D" 
5450. .5584 
/number =5 
5585. .8252 

/note="ProC cds intron E' 
8253. .8395 
/number =6 
8396. .9268 

/note=*ProC cds intron F" 



961 
1021 
1081 



9269. .9386 
/number^ 7 
intron 9387. .10515 

/note= B ProC cds intron G" 
exon 10516. .>11105 

/note=" Protein C" 
/number =8 

BASE COUNT 2444 a 3298 c 3375 g 2608 t 

ORIGIN 575 bp upstream of StuI site; chromosome 2ql4-q21. 

1 agtgaatctg ggcgagtaac acaaaacttg agtgtcctta cctgaaaaat 
61 gggatgctat gtgccattgt gtgtgtgtgt tgggggtggg gattgggggt 
121 caattggagg tgagggtgga gcccagtgcc cagcacctat gcactgggga 
181 agcatcttct catgatttta tgtatcagaa attgggatgg catgtcattg 
241 ttttttcttg tatggtggca cataaataca tgtgtcttat aattaatggt 
301 tgacgaaata tggaatatta cctgttgtgc tgatcttggg caaactataa 
361 caaaaatgtc cccatctgaa aaacagggac aacgttcctc cctcagccag 
421 gctaaaatga gaccacatct gtcaagggtt ttgccctcac ctccctccct 
481 atccttggta ggcagaggtg ggcttcgggc agaacaagcc gtgctgagct 
541 gtgctagtgc cactgtttgt ctatggagag ggaggcctca gtgctgaggg 
601 atttgtggtt atggattaac tcgaactcca ggctgtcatg gcggcaggac 
661 cagtatctcc acgacccgcc cctgtgagtc cccctccagg caggtctatg 
721 agggagggct gcccccggga gaagagagct aggtggtgat gagggctgaa 
781 agggtgctca acaagcctga gcttggggta aaaggacaca aggccctcca 
841 ctggcagcca cagtctcagg tccctttgcc atgcgcctcc ctctttccag 
901 cccaggccca gggccattcc aacagacagt ttggagccca ggaccctcca 
cccacttcca cctttggggg tgtcggattt gaacaaatct cagaagcggc 
gtcggcaaga atggagagca gggtccggta gggtgtgcag aggccacgtg 
tggggagggt tccttgatct ctggccacca gggctatctc tgtggccttt 
1141 tggtggtttg gggcaggggt tgaatttcca ggcctaaaac cacacaggcc 
1201 tcctggctct gcgagtaatg catggatgta aacatggaga cccaggacct 
1261 ttccgagtct ggtgcctgca gtgtactgat ggtgtgagac cctactcctg 
1321 gacagaatct gatcgatccc ctgggttggt gacttccctg tgcaatcaac 
1381 aagggttgga tttttaataa accacttaac tcctccgagt ctcagtttcc 
1441 aatggggttg acagcattaa taactacctc ttgggtggtt gtgagcctta 
1501 taatatctca tgtttactga gcatgagcta tgtgcaaagc ctgttttgag. 
1561 ggactaactc ctttaattct cacaacaccc tttaaggcac agatacacca 
1621 tccattttac aaatgaggaa actgaggcat ggagcagtta agcatcttgc 
1681 cctccagtaa gtgctggagc tggaatttgc accgtgcagt ctggcttcat 
1741 gtgaatcctg taaaaattgt ttgaaagaca ccatgagtgt ccaatcaacg 
}lc} ttctca 9 ccc agtcatcaga ccggcagagg cagccacccc actgtcccca 
1861 aaacatcctg gcaccctctc cactgcattc tggagctgct ttctaggcag 
loo 1 ctca 9 cccca cgtagagcgg gcagccgagg ccttctgagg ctatgtctct 
1981 gaccctcaat tccagcttcc gcctgacggc cagcacacag ggacagccct 
2041 ttccacctgg gggtgcaggc agagcagcag cgggggtagc actgcccgga 
2101 cctcctcaga caggtgccag tgcctccaga atgtggcagc tcacaagcct 
2161 gtggccacct ggggaatttc cggcacacca gctcctcttg gtaaggccac 
2221 ccccgggacc cttgtggcct ctacaaggcc ctggtggcat ctgcccaggc 
2281 tccaccatct ctctgagccc tgggtgaggt gaggggcaga tgggaatggc 
2341 tgacaagtcc caggtaggcc agctgccaga gtgccacaca ggggctgcca 
2401 gcgtgatggc agggagcccc gcgatgacct cctaaagctc cctcctccac 
2461 tcacagagtc ccctgggcct tccctctcca cccactcact ccctcaactg 
2521 aggcccaggc taccgtccac actatccagc acagcctccc ctactcaaat 
2581 ctcatggctg ccctgcccca acccctttcc tggtctccac agccaacggg 
2641 gattcttggg gaggtccgca ggcacatggg cccctaaagc cacaccaggc 
2701 atttgtgcct ttatagagct gtttatctgc ttgggacctg cacctccacc 
2761 gtgccctcag ctcaggcata ccctcctcta ggatgccttt tcccccatcc 
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acacccccaa 
gagcccagga 
ccctgcccaa 
ggatgtgtca 
ggcagccttt 
atgttttaca 
gtaaagacac 
catctcagag 
gaggacccct 
agaccttccc 
ttaggcccct 
agcgagcgtg 
ctccgtcaca 
aaggaaattt 
ctcaggggcg 
ggccctctta 
gtcggccagg 
gtggggtgac 
cagccactgc 
caccctgctt 
tgtccacata 

gggggtcgga 
ccagagtcct 
ctgagacaag 
aattgtgcac 
cagggcagga 
ggacccccat 
accacagtga 
ctcacccatg 
aggcacaggc 
agcagcaacc 
gaaaaacact 
agggagtgat 
tgcagaggct 

gggtgttgct 

ggggaggggc 
ttgtctggaa 
gcccctccgc 
ctggtccaag 
cccctcggga 
tgcttggtct 
atcgacggca 
cagcgcggtg 
tgggggcgcg 
ctcaattgct 
cggcgctgta 
gcaggtgaga 
ccctgacggg 
gttgagcctt 
cccgggagct 
cctccgggcg 
gctccccagt 
ctgcgttttt 
ccttgaggag 
ttaatcaaat 
tcagcatgct 
ttttaatgtg 
atctcccctt 
ctctagtttt 
actttctttt 
tgcaatgacg 
gccgcagcct 
tgtttttagt 
caggtgatcc 
cccagcctct 



cttgatctct 
cacacctggg 
ggggagaagc 
agatggggct 
acagcagcag 
tgacggtctc 
tggctcaagg 
caaggcttcg 
gcgccaagcc 
aggctctccc 
caccaaggtg 
cccaccaggt 
gcagcctgga 
tccaaaatgt 
agctggtaac 
ggagttgtgg 
cacatgtgac 
cctaggtggg 
tgagcaccac 
ccacccatgc 
aaatcgctca 
agacagggtc 
gtggacgtgg 
gctcagaccc 
gcctgggccc 
gaccctctgg 
ctggacctcc 
ctttctgcag 
gtctctcagc 
cgtgggtctc 
ctggtacctg 
ggcttaggga 
gggactggaa 
gctgtgggag 
ccagggacgt 
agggagcacc 
gccctcccct 
acaccggctg 
cacgtcggtg 
tctctggccg 
tgcccttgga 
tcggcagctt 
agggggagag 
gcaccagcac 
ctctggacaa 
gctgtgcgcc 
agcccccaat 
cgcggcgcgg 
ggggcagcgg 
gggcgcgccc 
cccctgcgac 
ctgagcgtat 
ctctgacgtt 
agaacagaat 
ttatatatgt 
gttccttggc 
gaaattccta 
tacttcctct 
attgtctctt 
ttttcttttt 
tgatctcagc 
cccgagtagc 
agagaagggg 
acctgccttg 
ttcagggaac 



ccctcctaac 
gacccttcct 
atggggaata 
gcatgtggtg 
ccagggcttg 
atccccatgt 
tcacacagag 
tcctccaact 
atgacctaga 
agctctgctt 
agctcccctc 
gctgcggatc 
gcgggagtgc 
ggatgacaca 
cagcaggggc 
gggtggctga 
tgcaagaaac 
gactcccaca 
tgcctccccg 
ctctgctgat 
ctctgtgcct 
tgtgtcctat 
ccctaggtag 
gctctgtccc 
ccttccaagg 
cctgcaccct 
atccccacca 
gcacatatct 
cccagcagcc 
aacgtgggct 
gttaggaacg 
aaggcgcgat 
ggaggccgag 
cggacagtcg 
gggatggagg 
agctcctagc 
cccctgcccg 
caggagcctg 
agtgcgttct 
ctgaccccct 
gcacccgtgc 
cagctgcgac 
gtggatgctg 
cagctgcccg 
cggcggctgc 
tggctacaag 
acatcgccca 
ggggctcagg 
cagacgcgcc 
tccgctttcc 
ctggggccac 
ctggggcgag 
gtccggcgtg 
cccgattctg 
atgaaacttt 
atgggtcctt 
tcttctgcct 
attttctctt 
ctatttccca 
ttttgagatg 
tcaccacaac 
tgggattaca 
tttctccgtg 
gcctcctaaa 
tttctacaac 
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6721 tttttggcca ggctcagcag cccagaccaa taattccagc actttgagag gctgaggtgg 
III] ??f?!5H ce Cgagcctg 99 agtttgagac tagcctgggc aacacagtga gaccctgtct 
" " a "" Caa aa aaagtaaa aaaagatcta aaaatttaac tttttatttt gaaataatta 
ill] t Ca g 9 a agctgca aagaaatgcc tggtgggcct gttggctgtg ggtttcctgc 

ill] aa ?^^ 9 ? g aaggccctg tcattggcag aaccccagat cgtgagggct ttccttttag 
ml] * aagaggactc ctccaagccc ttggaggatg gaagacgctc acccatggtg 

7081 ttcggcccct cagagcaggg tggggcaggg gagctggtgc ctgtgcaggc tgtggacatt 
72ol ^Sf C cctgtggtca gctaagagca ccactccttc ccgaagcggg gcctgaagtc 
ill] f 9cctctggtt caccttctgc aggcagggag aggggagtca agtcagtgag 

7261 gagggctttc gcagtttctc ttacaaactc tcaacatgcc ctcccacctg cactgccttc 
7321 ctggaagccc cacagcctcc tatggttccg tggtccagtc cttcagcttc tgggcgcccc 
ill] "tcacgggc tgagattttt gctttccagt ctgccaagtc agttactgtg tccatccatc 
7441 tgctgtcagc ctctggaatt gttgctgttg tgccctttcc attcttttgt tatgatgcag 
7Sfii ™~»S^ gacgacgtcc cattgctctt ttaagtctag atatctggac tgggcattca 
7561 aggcccattt tgagcagagt cgggctgacc tttcagccct cagttctcca cggagtatgc 
III] " ^cagggag gcctcaeaaa catgccatgc ctattgtagc agctctccaa 

7681 gaatgctcac ctccttctcc ctgtaattcc tttcctctgt gaggagctca gcagcatccc 
7801 tt™ a 2 a ? " tactaatc ccagggatca cccccaacag ccctggggta caatgagctt 
ill] " aa9aag " taaccaccta tgtaaggaga cacaggcagt gggcgatgct gcctggcctg 
ill] " gggtggta "gtttgttg actgactgac tgactgactg gaggiggtt? 

ill] ?t aa »^ 9 ^ tctcagggat tacccccaac agccctgggg tacaatgagc cttciagaag 
III] r-r^S?^ tatgtaagga cacacagcca gtgggtgatg ctgcctggtc tgactcttgc 
8041 cattcagtgg cactgtttgt tgactgactg actgactgac tggctgactg gagggggttc 
8101 atagctaata ttaatggagt ggtctaagta tcattggttc ciEgaicccE gclclglggc 
8551 acaggctgga ggaggaccaa gacaggaggg cagtctcggg aggagtgcct 

III] tcaccacctc tgcctacctc agtgaagttc ccttgtggga ggccctggaa 

8281 gcggatggag aagaagcgca gtcacctgaa acgagacaca gaagaccaag aagaccaagt 
lit] a S™^ 9 ctcattgat 9 ggaagatgac caggcgggga gacagcccct ggcaggtggg 
lil] 9caccggctc gtcacgtgct gggtccggga tcactgagtc catcctggca 

8551 !f^ 9 3ggtgcagaa accgagaggg aagcgctgcc attgcgtttg ggggatgatg 
8581 at9cttcagg gaaagatgga cgcaacctga ggggagagga gcagccaggg 

J tgggtgaggg gaggggcatg ggggcatgga ggggtctgca ggagggaggg ttacagtttc 
8701 o^ a9a9C ^aaagaca ctgctctgct ggcgggattt taggcagaag ccctgctgat 
87«i H?H a ? a999 f Cagga 99Sag ggccgggcct gagtacccct ccagcctcca catgggaact 
till llnt^n ?ggttcccct ctctgccagg catgggggag ataggaacca acaagtggga 
III] g tatttgccc tggggactca gactctgcaa gggtcaggac cccaaagacc cggcagccca 
' g * 99ga f Cac a 9 cca 99acg gcccttcaag ataggggctg agggaggcca aggggaacat 
" ccaggcagcc tgggggccac aaagtcttcc tggaagacac aaggcctgcc aagcctctaa 
oftfii g gatgagagg agctcgctgg gcgatgttgg tgtggctgag ggtgactgaa acagtatgaa 
ov^i fagtgcagga acagcatggg caaaggcagg aagacaccct gggacaggct gacactgtaa 
o,o, aat 999 caaa aatagaaaac gccagaaagg cctaagccta tgcccatatg accagggaac 
III] oro!?^ 39 ^ gcatatgaaa cccaggtgcc ctggactgga ggctgtcagg aggcagccct 
Urn g ^ gatgtcat catcccaccc cattccaggt ggtcctgctg gactcaaaga agaagctggc 
III] " gcggggca 9tgctcatcc acccctcctg ggtgctgaca gcggcccact gcatggatga 
9421 I™ 3 .?™ 9 = tccttgtca ggcttggtat gggctggagc caggcagaag ggggctgcca 
111] 93 ?? Ctg " ta 9393gacc aggcaggctg ttcaggtttg ggggaccccg ctccccaggt 
9541 ™ aa9 " a g aggcttctt gagctccaca gaaggtgttt ggggggaaga ggcctatgtg 
lit] trrrrnaJ gcccacccat 9tacacccag tattttgcag tagggggttc tctggtgccc 
9fifii tgggcacagg tacctgcaca cacatgtttg tgaggggcta cacagacctt 

lit] ™~ ctcccactca tgaggagcag gctgtgtggg cctcagcacc cttgggtgca 

9721 gagaccagca aggcctggcc tcagggctgt gcctcccaca gactgacagg gatggagctg 
III] ^ a ? a9 " a gcccta 9cat ctgccaaagc cacaagctgc ttccctagca ggctgggggc 
9901 a a™ tggccccgat ctatggcaat ttctggaggg ggggtctggc tcaactcttt 
III] » gaaggcaaag catattgaga aaggccaaat tcacatttcc tacagcataa 

10051 ^™ Ca9 tgg f ccc 9tg gggcttggct tagaattccc aggtgctctt cccagggaac 
lSSli Sor^^o gat * gagagg accttctctc tcaggtggga cccggccctg tcctccctgg 
im?J "gtgccgtg ttctgggggt cctcctctct gggtctcact gcccctgggg tctctccagc 
1S20I ? t " ccatgttcct ttgtggctct ggtctgtgtc EggggtttS aggggtctcg 

10201 ggcttccctg ctgcccattc cttctctggt ctcacggctc cgtgactcct gaaaaccaac 
"cctttgga ttgacacctg ttggcllctc cltctggcag laaaagtcac 
]lll] cgtc 9atagg gttccacggc atagacaggt ggctccgcgc cagtgcctgg gacgtgtggg 
J tgcacagtct ccgggtgaac cttcttcagg ccctctccca ggcclgcagg ggcfclglfg 
lottl ^ a ? 9aaagt gcca «gggg agaggctccc cgcagcccff ?ctgac?gtl 

10561 ^^?? aga9 tatga «tgc ggcgctggga gaagtgggag ctggaccEgi 

10561 acatcaagga ggtcttcgtc caccccaact acagcaagag caccaccgac aatgacatcg 
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cactgctgca 
cggacagcgg 
gctggggcta 
acttcatcaa 
tgtctgagaa 
acagtggggg 
gctggggtga 
acctcgactg 
cttagcgacc 
acatgtaaca 
ggagggaagt 
taactcctag 
agctgtgtgt 
agccactaga 
gtgaggcttg 
cacagaggag 
aggtctgact 
tttcgacggt 
agaaagtgtt 



cctggcccag 
ccttgcagag 
ccacagcagc 
gattcccgtg 
catgctgtgt 
gcccatggtc 
gggctgtggg 
gatccatggg 
ctccctgcag 
agcacaccgg 
aacatttact 
agcaactctg 
gttgaggggg 
gccttttcca 
accagctttc 
gaaactgagg 
ccaaaaccca 
gctcagtgtg 
ggttcagccc 



cccgccaccc 
cgcgagctca 
cgagagaagg 
gtcccgcaca 
gcgggcatcc 
gcctccttcc 
ctccttcaca 
cacatcagag 
ggctgggctt 
cctgctgttc 
gagcacctgt 
tggggtgggg 
atactctgtt 
gggctttggg 
cagctagccc 
ggtctgaaag 
ggtgcttttt 
gaggccacta 
agaat 



tctcgcagac 
atcaggccgg 
aggccaagag 
atgagtgcag 
tcggggaccg 
acggcacctg 
actacggcgt 
acaaggaagc 
ttgcatggca 
tgtccttcca 
tgtatgtcac 
aggagcagat 
tatgaaaaag 
aagagcctgt 
agctatgagg 
gtttacatgg 
tctgttctcc 
ttagctctgt 



catagtgccc 
ccaggagacc 
aaaccgcacc 
cgaggtcatg 
gcaggatgcc 
gttcctggtg 
ttacaccaaa 
cccccagaag 
atggatggga 
tccctctttt 
atgccttatg 
ccaagttttg 
aataaaaaac 
gcaagccggg 
tagacatgtt 
tggagccagg 
actgtcctgg 
agggaagcag 



atctgcctcc 
ctcgtgacgg 
ttcgtcctca 
agcaacatgg 
tgcgagggcg 
ggcctggtga 
gtcagccgct 
agctgggcac 
cattaaaggg 
gggctcttct 
aatagaatct 
cggggtctaa 
acaaccacga 
gatgctgaag 
tagctcatat 
attcaaatct 
aggacagctg 
ccagagaccc 
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LOCUS 

DEFINITION 
with 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
Kohr,W. , 

TITLE 

JOURNAL 
MEDLINE 
COMMENT 



FEATURES 

source 



mRNA 



DNA, it 



HUMLCAT 1744 bp mRNA P ri 07-JAN-1995 

Human lecithin-cholesterol acyl transferase mRNA, complete cds, 

5' and 3' flanking DNA sequences. 
M12625 
gl87022 

lecithin cholesterol acyl transferase 

Human adult liver (library of A.Ullrich and L.Coussens), cDNA to 
mRNA, clones PL[2, 4,10, 12, 19 J , and DNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo, 
l (oases 1 to 1744) 

McLean, J., Fielding, C, Drayna,D., Dieplinger , H. , Baer,B., 
Henzel,W. and Lawn, R. 

Cloning and expression of human lecithin-cholesterol 
acyl transferase cDNA 

86205950^' ***** ^ ' U ' S ' A ' 83 (8> ' 2335 ~ 2339 < 1986 > 
Draft entry and sequence in computer readable form for [1] kindly 
provided by J. W. McLean, 24 -JUL- 1986 y 
Because only the 5' and 3 • flanking sequences were determined from 
is not known whether this gene contains introns. 
Location/Qualifiers 
1. .1744 

/organism= "Homo sapiens" 
/db_xref = " taxon : 9 6 06 ■ 
/map="16q22.1" 
<257. .1610 
/note="LCAT mRNA" 
268. .339 
/gene="LCAT" 

/note=" lecithin-cholesterol acyl transferase signal 
peptide" 
268. .1590 
/gene="LCAT" 
268. .1590 
/gene="LCAT" 

/note="lecithin-cholesterol acyl transferase precursor (EC 

2.3.1.43)" 

/codon_start=l 

/db_xref= "GDB:G00-119-359" 

/db_xref= ■ PID : g3 07117 " 

/translation "MGPPGSFWQWVTLI^LI^PP^ 

HTRPVILVPGCLGNQLEAKI^KPDWl^CYRCT 

TRVVYNRSSGLVSNAPGVQIRVPGFGKTYSWYLDSSKIAGYLHT^ 
ETTOAAPYDWRLEPGQQEEYYR^ 

PQAWKDRFIDGFISLGAPWGGSIKPMLVliASGDNQGI^ 

FPSRMAWPEDHVFISTPSFNYTGRDFQRFFADLHFEEGWYMWI^SRDLIA 

VYCLYGVGLPTPRTYIYDHGFPYTDPVGVLYEDGDDTVATRSTELCGLWQGRQPQPVH 

LL PLHG I QHLNMVF SNLTLEHI NAI LLGAYRQG P PAS PT AS PE P P P PE " 
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mat_peptide 340. .1587 

/gene="LCAT 0 

/note=" lecithin-cholesterol acyl transferase" 
BASE COUNT 324 a 589 c 475 g 356 t 

ORIGIN 30 bp upstream of Styl recognition sequence. 

1 tgaggcctga ctttttcaat aaaacattgt gtagttctgg gcctcctgct gccccggctc 
61 tgtttcccct ggcgccaaga gaagaaggcg gaactgaacc caggcccaga gccggctccc 
121 tgaggctgtg cccctttccg gcaatctctg gccacaaccc ccactggcca ggccgtccct 
181 cccactggcc ctagggcccc tcccactccc acaccagata aggacagccc agtgccgctt 
241 tctctggcag taggcaccag ggctggaatg gggccgcccg gctccccatg gcagtgggtg 
301 acgctgctgc tggggctgct gctccctcct gccgccccct tctggctcct caatgtgctc 
361 ttccccccgc acaccacgcc caaggctgag ctcagtaacc acacacggcc cgtcatcctc 
421 gtgcccggct gcctggggaa tcagctagaa gccaagctgg acaaaccaga tgtggtgaac 
481 tggatgtgct accgcaagac agaggacttc ttcaccatct ggctggatct caacatgttc 
541 ctaccccttg gggtagactg ctggatcgat aacaccaggg ttgtctacaa ccggagctct 
601 gggctcgtgt ccaacgcccc tggtgtccag atccgcgtcc ctggctttgg caagacctac 
661 tctgtggagt acctggacag cagcaagctg gcagggtacc tgcacacact ggtgcagaac 
721 ctggtcaaca atggctacgt gcgggacgag actgtgcgcg ccgcccccta tgactggcgg 
781 ctggagcccg gccagcagga ggagtactac cgcaagctcg cagggctggt ggaggagatg 
841 cacgctgcct atgggaagcc tgtcttcctc attggccaca gcctcggctg tctacacttg 
901 ctctatttcc tgctgcgcca gccccaggcc tggaaggacc gctttattga tggcttcatc 
961 tctcttgggg ctccctgggg tggctccatc aagcccatgc tggtcttggc ctcaggtgac 
1021 aaccagggca tccccatcat gtccagcatc aagctgaaag aggagcagcg cataaccacc 
1081 acctccccct ggatgtttcc ctctcgcatg gcgtggcctg aggaccacgt gttcatttcc 
1141 acacccagct tcaactacac aggccgtgac ttccaacgct tctttgcaga cctgcacttt 
1201 gaggaaggct ggtacatgtg gctgcagtca cgtgacctcc tggcaggact cccagcacct 
1261 ggtgtggaag tatactgtct ttacggcgtg ggcctgccca cgccccgcac ctacatctac 
1321 gaccacggct tcccctacac ggaccctgtg ggtgtgctct atgaggatgg tgatgacacg 
1381 gtggcgaccc gcagcaccga gctctgtggc ctgtggcagg gccgccagcc acagcctgtg 
1441 cacctgctgc ccctgcacgg gatacagcat ctcaacatgg tcttcagcaa cctgaccctg 
1501 gagcacatca atgccatcct gctgggtgcc taccgccagg gtccccctgc atccccgact 
1561 gccagcccag agcccccgcc tcctgaataa agaccttcct ttgctaccgt aagccctgat 
1621 ggctatgttt caggttgaag ggaggcacta gagtcccaca ctaggtttca ctcctcacca 
1681 gccacaggct cagtgctgtg tgcagtgagg caagatgggc tctgctgagg cctgggactg 
1741 agct 
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LOCUS HUMHCII 2182 bp mRNA 

DEFINITION Human heparin cof actor II (HC-II) mRNA, 
ACCESSION M12849 M19241 
NID gl83909 

KEYWORDS heparin cof actor II; protease inhibitor. 
SOURCE Human fetal liver, cDNA to mRNA, clone lambda-HCII .7 

liver, cDNA to mRNA, clone lambda HCII.7.1 [3]. 
ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 



PRI 08-NOV-1994 
complete cds. 



[1]; adult 



REFERENCE 
AUTHORS 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 

TITLE 



JOURNAL 
MEDLINE 
COMMENT 



FEATURES 

source 



Catarrhini; Hominidae; Homo. 



137 (1), 431-436 (1986) 



Deaven,L.L. and 



cDNA sequence, chromosome localization, 
length polymorphism, and expression in 

752-759 (1988) 



Vertebrata; Eutheria; Primates; 

1 (bases 1025 to 2182) 
Inhorn,R.C. and Tollef sen,D.M. 
Unpublished (1986) 

2 (bases 1025 to 2182) 
Inhorn,R.C. and Tollefsen,D.M. 

Isolation and characterization of a partial cDNA clone for heparin 
cofactor III 

Biochem. Biophys. Res. Commun. 
86242236 

3 (bases 1 to 2182) 

Blinder, M. A. , Marasa,J.C, Reynolds, C.H. 
Tollef sen, D.M. 
Heparin cofactor II: 
restriction fragment 
Escherichia coli 
Biochemistry 27 (2), 
88163663 

[1] revises (2] . 

Draft entry and computer -readable sequence of [2] kindly provided 
by D.M.Tollefsen, 18-AUG-1986. 

Draft entry and computer-readable sequence of [3] kindly provided 
by Blinder, M. A. 24-MAR-1988. 

Location/Qualifiers 
1. .2182 

/organism= "Homo sapiens" 
/db_xref = " taxon: 9606 » 
/map="22qll.2" 
<1. .2182 

/note= "heparin cofactor II mRNA" 
29. .85 

/gene="HCF2" 

/note= "heparin cofactor II signal protein" 
29. .1528 
/gene="HCF2" 
29. .1528 
/gene= B HCF2" 

/note=" heparin cofactor II precursor" 
/codon_start=l 
/db_xref="GDB:G00-120-038" 
/db_xref="PID:gl83910" 

/ translations "MKHSLNALLIFLIITSAWGGSKGPLDQLEKGGETAQSADPQWEQ 

LNNKNLSMPLLPADFHKENTVTNDWIPEGETO 

PTDSDV SAGNI LQLFHGKSRIQRLNILNAKFAFNLYRVliKDQVNTFDNI F I APVGI ST 
AMGMISLGLKGETHEQVHSILHFKDFVIIASSKYEITTIHNLFRKLTHR^ 
RSVNDLYI QKQF PI LLDFRTKVREYYFAEAQIADFSDPAF I SKTNNHIMKLTKGLI KD 
ALENI D PATQMMI LNC I YFKG SWVNKF PVEMTHNHNFRLNEREVVKVSMMQTKGNFLA 
ANDQELDCDILQLEYVGGI SMLIWPHKMSGMKTLEAQLTPRWERWQKSMTNRTREV 
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LLPKFKLEKNYNLVESLKLMGIRMLFDKNGNMAGISDQRIAIDLFKHQGTITVNEEGT 

QATTVTTVGFMPLSTQVRFTVDRPFLFLI YEHRTSCLLFMGRVANPSRS ■ 

mat_peptide 86 . . 1525 

/gene="HCF2* 

/no te= "heparin cof actor II" 

BASE COUNT 603 a 581 c 500 g 498 t 

ORIGIN 142 bp upstream from PstI site; chromosome 22. 

1 cgaaacacag agctttagct ccgccaaaat gaaacactca ttaaacgcac ttctcatttt 

61 cctcatcata acatctgcgt ggggtgggag caaaggcccg ctggatcagc tagagaaagg 

121 aggggaaact gctcagtctg cagatcccca gtgggagcag ttaaataaca aaaacctgag 

181 catgcctctt ctccctgccg acttccacaa ggaaaacacc gtcaccaacg actggattcc 

241 agagggggag gaggacgacg actatctgga cctggagaag atattcagtg aagacgacga 

301 ctacatcgac atcgtcgaca gtctgtcagt ttccccgaca gactctgatg tgagtgctgg 

361 gaacatcctc cagctttttc atggcaagag ccggatccag cgtcttaaca tcctcaacgc 

421 caagttcgct ttcaacctct accgagtgct gaaagaccag gtcaacactt tcgataacat 

481 cttcatagca cccgttggca tttctactgc gatgggtatg atttccttag gtctgaaggg 

541 agagacccat gaacaagtgc actcgatttt gcattttaaa gactttgtta atgccagcag 

601 caagtatgaa atcacgacca ttcataatct cttccgtaag ctgactcatc gcctcttcag 

661 gaggaatttt gggtacacac tgcggtcagt caatgacctt tatatccaga agcagtttcc 

721 aatcctgctt gacttcagaa ctaaagtaag agagtattac tttgctgagg cccagatagc 

781 tgacttctca gaccctgcct tcatatcaaa aaccaacaac cacatcatga agctcaccaa 

841 gggcctcata aaagatgctc tggagaatat agaccctgct acccagatga tgattctcaa 

901 ctgcatctac ttcaaaggat cctgggtgaa taaattccca gtggaaatga cacacaacca 

961 caacttccgg ctgaatgaga gagaggtagt taaggtttcc atgatgcaga ccaaggggaa 

1021 cttcctcgca gcaaatgacc aggagctgga ctgcgacatc ctccagctgg aatacgtggg 

1081 gggcatcagc atgctaattg tggtcccaca caagatgtct gggatgaaga ccctcgaagc 

1141 gcaactgaca ccccgggtgg tggagagatg gcaaaaaagc atgacaaaca gaactcgaga 

1201 agtgcttctg ccgaaattca agctggagaa gaactacaat ctagtggagt ccctgaagtt 

1261 gatggggatc aggatgctgt ttgacaaaaa tggcaacatg gcaggcatct cagaccaaag 

1321 gatcgccatc gacctgttca agcaccaagg cacgatcaca gtgaacgagg aaggcaccca 

1381 agccaccact gtgaccacgg tggggttcat gccgctgtcc acccaagtcc gcttcactgt 

1441 cgaccgcccc tttcttttcc tcatctacga gcaccgcacc agctgcctgc tcttcatggg 

1501 aagagtggcc aaccccagca ggtcctagag gtggaggtct aggtgtctga agtgccttgg 

1561 gggcaccctc attttgtttc cattccaaca acgagaacag agatgttctg gcatcattta 

1621 cgtagtttac gctaccaatc tgaattcgag gcccatatga gaggagctta gaaacgacca 

1681 agaagagagg cttgttggaa tcaattctgc acaatagccc atgctgtaag ctcatagaag 

1741 tcactgtaac tgtagtgtgt ctgctgttac ctagagggtc tcacctcccc actcttcaca 

1801 gcaaacctga gcagcgcgtc ctaagcacct cccgctccgg tgaccccatc cttgcacacc 

1861 tgactctgtc actcaagcct ttctccacca ggcccctcat ctgaatacca agcacagaaa 

1921 tgagtggtgt gactaattcc ttacctctcc caaggagggt acacaactag caccattctt 

1981 gatgtccagg gaagaagcca cctcaagaca tatgaggggt gccctgggct aatgttaggg 

2041 cttaattttc tcaaagcctg acctttcaaa tccatgatga atgccatcag tccctcctgc 

2101 tgttgcctcc ctgtgacctg gaggacagtg tgtgccatgt ctcccatact agagataaat 

2161 aaatgtagcc acatttactg tg 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
COMMENT 

FEATURES 

source 



PRI 

complete cds . 



08-AUG-1995 



HUMFVA 6893 bp mRNA 

Human coagulation factor V mRNA, 
M14335 M17785 
gl82797 

coagulation factor V; factor V; glycoprotein. 

Human liver (normal hepatocyte and HepG-2 cells), cDNA to mRNA, 
clones HV3.37, HV0.85, HV1.66 and HV2.97. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata,* Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 3636 to 6893) 
Kane,W.H. and Davie, E.W. 

Cloning of a cDNA coding for human factor V, a blood coagulation 
factor homologous to factor VIII and ceruloplasmin 
Proc. Natl. Acad. Sci. U.S.A. 83 (18), 6800-6804 (1986) 
86313665 

2 (bases 1 to 4876) 

Kane,W.H., Ichinose,A., Hagen,F.S. and Davie, E.W. 

Cloning of cDNAs coding for the heavy chain region and connecting 

region of human factor V, a blood coagulation factor with four 

types of internal repeats 

Biochemistry 26 (20) , 6508-6514 (1987) 

88107560 

Draft entry and computer -readable sequence 11] kindly submitted by 
W.H.Kane, 13-JUN-1988. 

Location/Qualifiers 

1. .6893 

/organism="Homo sapiens" 
9606" 



/db_xref = " taxon : 

/map="lq21-q25" 
gene 77.. 6751 

/gene="F5" 
sig_peptide 77.. 160 

/gene="F5" 

/note=" factor V signal peptide" 
CDS 77.. 6751 

/gene="F5" 

/note=" factor V precusor" 
/codon_start=l 
/db_xref=°GDB:G00-U9-896" 
/ db_xr e f = ■ PID : gl 82 7 9 8 " 

/ translation "MFPGCPRLWVLWLGTSWVGWGSQGTEAAQLRQFYVAAQGISWS 

YRPEPTNSSLNLSVTSFKKr\TYTlEYEPYFKKEKPQSTISGLLGPTL 

KNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTYEWSISEDSGP 

THDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTEGGTQKTFDKQIVLLFAVF 

DESKSWSQSSSIJmWGYWGTMPDITVCAHDHISWHLLGMSSGPELFSIHFNGQVL 

EQNHHKVSAITLVSATSTTANMTVGPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTR 

NLKKITREQRRHMiaWEYFIAAEEVIWDYAFV^^ 

KVMYTQYEDESFTKHTVNPNMKEDGII^PIIRA^ 

TFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQCLTRPYYSD 
VDIMRDIASGLIGLLLICKSRSLDRRGIQRAADIEQQAVFAVFDENKSWYLEDNINKF 
CEa^DEVTCRDDPKFYESNIMSTINGWPESITTUSFCFDDTVQWHFCSVGTQNEILTI 
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HFTGHSF I YGKRHEDTLTLF PMRGESVTVTMDNVGTWMLTSMNS S PRSKKLRLKFRDV 

KCIPDDDEDSYEIFEPPESTVMATRKMHDRLEPEDEESDADYT^^ 

NSSLNQEEEEFNLTALALENGTEFVSSNTDIIVGSNYSSPSNISKFTVNNLAEPQKAP 

SHQQATTAGSPLRHLIGKNSVI^SSTAEHSSPYSEDPIEDPLQPDVTGIRLLSLGAGE 

FRSQEHAKJIKGPKVERDQAAKHRFSWMKLL^ 

TGSPSRMRPWEDPPSDI^LLKQSNSSKILVGRWHIASEKGSYEIIQDTDEDTAVNN^ 
ISPQNASRAWGESTPLANKPGKQSGHPKFPRVRHKSLQVRQDGGKSRLKKSQFLIKTR 
KKKKEKHTHHAPLSPRTFHPLRSEAYNTFSERRLKHSLVLHKSNETSLPTDLNQTLPS 
MDFGWIASLPDHNQNSSNDTGQASCPPGLYQTVPPEEHYQTFPIQDPDQMHSTSDPSH 
RS S S PELSEMLEYDRSHKSF PTDI SQMS PS SEHEVWQTVI S PDLSQVTLS PELSQTNL 
SPDLSHTTLSPELIQRNLSPALGQMPISPDLSHTTLSPDLSHTTLSLDLSQTNLSPEL 
SQTNLSPALGQMPLSPDLSHTTISLDFSQTNLSPELSHMTLSPELSQTNLSPALGQMP 
ISPDLSHTTLSLDFSQTNLSPELSQTNLSPAI^QMPLSPDPSHTTLSLDLSQTNLSPE 
LSQTNLSPDLSEMPLFADLSQIPLTPDLDQMTLSPDLGETDLSPNFGQMSLSPDLSQV 
TLSPDISDTTLLPDLSQISPPPDLDQIFYPSESSQSLLLQEFNESFPYPDLGQMPSPS 
SPTLNTDTFLSKEFNPLVIVGLSKDGTDYIEIIPKEEVQSSEDDYAEIDYVPYDDPYKT 
DVRTNINSSRDPDNIAAWYLRSNNGNRRNYYIAAEEISWDYSEFVQRETDIEDSDDIP 
EimTKKWFRKYLDSTFTKRDPRGEYEEHIXSIIXSPIIRAE 

SLHAHGLSYEKSSEGKTYEDDSPEWFKEDNAVQPNSSYTYVWHATERSGPESPGSACR 

AWAYYSAVNPEKDIHSGLIGPLLICQKGILHKDSNMPVDMREFVLLFMTFDEKKSWYY 

EKKSRSSWRLTSSEMKKSHEFliAINGMIYSLPGLKMYEQEWTOIJILLNIGGSQDIHW 

HFHGQTIJJENGNKQHQLGVWPLLPGSFKTLEMKASK^ 

LIMDRIXRMPMGLSTGIISDSQIKASEFIX3YWEPRLARL 

SKPWIQVDMQKEVIITGIQTQGAKHYUCSCYTTEFYVAYSSNQINWQIFKGNSTR^ 
YFNGNSDASTIKENQFDPPIVARYIRISPTRAYNRPTLRLELQGCEVNGCSTPLGMEN 
GKIENKQITASSFKKSWWGDYWEPFRARLNAQGRVNAWQ 

ITAI ITQGCKSLSSEMYVKSYTIHYSEQGVEWKPYRLKSSMVDKIFEGNTNTKGHVKN 

FFNPPI I SRFIRVI PKTWNQS IALRLELFGCDI Y " 
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mat_peptide 161 . . 6748 

/gene="F5" 

/note=" factor V w 
variation 3723 . . 4024 

/gene="F5- 

/note="ccctt in clone HV2.97 [1]" 
/replace="ccctt" 
BASE COUNT 2090 a 1700 c 1423 g 1680 t 

ORIGIN 270 bp upstream of AccI site; chromosome Iq21-q25. 

1 ctccgggctg tcccagctcg gcaagcgctg cccaggtcct ggggtggtgg cagccagcgg 
61 gagcaggaaa ggaagcatgt tcccaggctg cccacgcctc tgggtcctgg tggtcttggg 
121 caccagctgg gtaggctggg ggagccaagg gacagaagcg gcacagctaa ggcagttcta 
181 cgtggctgct cagggcatca gttggagcta ccgacctgag cccacaaact caagtttgaa 
241 tctttctgta acttccttta agaaaattgt ctacagagag tatgaaccat attttaagaa 
301 agaaaaacca caatctacca tttcaggact tcttgggcct actttatatg ctgaagtcgg 
361 agacatcata aaagttcact ttaaaaataa ggcagataag cccttgagca tccatcctca 
421 aggaattagg tacagtaaat tatcagaagg tgcttcttac cttgaccaca cattccctgc 
481 agagaagatg gacgacgctg tggctccagg ccgagaatac acctatgaat ggagtatcag 
541 tgaggacagt ggacccaccc atgatgaccc tccatgcctc acacacatct attactccca 
601 tgaaaatctg atcgaggatt tcaactctgg gctgattggg cccctgctta tctgtaaaaa 
661 agggacccta actgagggtg ggacacagaa gacgtttgac aagcaaatcg tgctactatt 
721 tgctgtgttt gatgaaagca agagctggag ccagtcatca tccctaatct acacagtcaa 
781 tggatatgtg aatgggacaa tgccagatat aacagtttgt gcccatgacc acatcagctg 
841 gcatctgctg ggaatgagct cggggccaga attattctcc attcatttca acggccaggt 
901 cctggagcag aaccatcata aggtctcagc catcaccctt gtcagtgcta catccactac 
961 cgcaaatatg actgtgggcc cagagggaaa gtggatcata tcttctctca ccccaaaaca 
1021 tttgcaagct gggatgcagg cttacattga cattaaaaac tgcccaaaga aaaccaggaa 
1081 tcttaagaaa ataactcgtg agcagaggcg gcacatgaag aggtgggaat acttcattgc 
1141 tgcagaggaa gtcatttggg actatgcacc tgtaatacca gcgaatatgg acaaaaaata 
1201 caggtctcag catttggata atttctcaaa ccaaattgga aaacattata agaaagttat 
1261 gtacacacag tacgaagatg agtccttcac caaacataca gtgaatccca atatgaaaga 
1321 agatgggatt ttgggtccta ttatcagagc ccaggtcaga gacacactca aaatcgtgtt 
1381 caaaaatatg gccagccgcc cctatagcat . ttaccctcat ggagtgacct tctcgcctta 
1441 tgaagatgaa gtcaactctt ctttcacctc aggcaggaac aacaccatga tcagagcagt 
1501 tcaaccaggg gaaacctata cttataagtg gaacatctta gagtttgatg aacccacaga 
1561 aaatgatgcc cagtgcttaa caagaccata ctacagtgac gtggacatca tgagagacat 
1621 cgcctctggg ctaataggac tacttctaat ctgtaagagc agatccctgg acaggcgagg 
1681 aatacagagg gcagcagaca tcgaacagca ggctgtgttt gctgtgtttg atgagaacaa 
1741 aagctggtac cttgaggaca acatcaacaa gttttgtgaa aatcctgatg aggtgaaacg 
1801 tgatgacccc aagttttatg aatcaaacat catgagcact atcaatggct atgtgcctga 
1861 gagcataact actcttggat tctgctttga tgacactgtc cagtggcact tctgtagtgt 
1921 ggggacccag aatgaaattt tgaccatcca cttcactggg cactcattca tctatggaaa 
1981 gaggcatgag gacaccttga ccctcttccc catgcgtgga gaatctgtga cggtcacaat 
2041 ggataatgtt ggaacttgga tgttaacttc catgaattct agtccaagaa gcaaaaagct 
2101 gaggctgaaa ttcagggatg ttaaatgtat cccagatgat gatgaagact catatgagat 
2161 ttttgaacct ccagaatcta cagtcatggc tacacggaaa atgcatgatc gtttagaacc 
2221 tgaagatgaa gagagtgatg ctgactatga ttaccagaac agactggctg cagcattagg 
2281 aattaggtca ttccgaaact catcattgaa ccaggaagaa gaagagttca atcttactgc 
2341 cctagctctg gagaatggca ctgaattcgt ttcttcgaac acagatataa ttgttggttc 
2401 aaattattct tccccaagta atattagtaa gttcactgtc aataaccttg cagaacctca 
2461 gaaagcccct tctcaccaac aagccaccac agctggttcc ccactgagac acctcattgg 
2521 caagaactca gttctcaatt cttccacagc agagcattcc agcccatatt ctgaagaccc 
2581 tatagaggat cctctacagc cagatgtcac agggatacgt ctactttcac ttggtgctgg 
2641 agaattcaga agtcaagaac atgctaagcg taagggaccc aaggtagaaa gagatcaagc 
2701 agcaaagcac aggttctcct ggatgaaatt actagcacat aaagttggga gacacctaag 
2761 ccaagacact ggttctcctt ccggaatgag gccctgggag gaccttccta gccaagacac 
2821 tggttctcct tccagaatga ggccctggga ggaccctcct agtgatctgt tactcttaaa 
2881 acaaagtaac tcatctaaga ttttggttgg gagatggcat ttggcttctg agaaaggtag 
2941 ctatgaaata atccaagata ctgatgaaga cacagctgtt aacaattggc tgatcagccc 
3001 ccagaatgcc tcacgtgctt ggggagaaag cacccctctt gccaacaagc ctggaaagca 
3061 gagtggccac ccaaagtttc ctagagttag acataaatct ctacaagtaa gacaggatgg 
3121 aggaaagagt agactgaaga aaagccagtt tctcattaag acacgaaaaa agaaaaaaga 
3181 gaagcacaca caccatgctc ctttatctcc gaggaccttt caccctctaa gaagtgaagc 
3241 ctacaacaca ttttcagaaa gaagacttaa gcattcgttg gtgcttcata aatccaatga 
3301 aacatctctt cccacagacc tcaatcagac attgccctct atggattttg gctggatagc 
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3361 
3421 
3481 
3541 
3601 
3661 
3721 
3781 
3841 
3901 
3961 
4021 
4081 
4141 
4201 
4261 
4321 
4381 
4441 
4501 
4561 
4621 
4681 
4741 
4801 
4861 
4921 
4981 
5041 
5101 
5161 
5221 
5281 
5341 
5401 
5461 
5521 
5581 
5641 
5701 
5761 
5821 
5881 
5941 
6001 
6061 
6121 
6181 
6241 
6301 
6361 
6421 
6481 
6541 
6601 
6661 
6721 
6781 
6841 



ctcacttcct 
aggtctttat 
tgatcaaatg 
aatgcttgag 
ttcctcagaa 
ctctccagaa 
agaactcatt 
cagccataca 
gacaaacctc 
cctttctcca 
tccagaactc 
cctcggtcag 
ccagacaaac 
gcccctttct 
ctctccagaa 
agatctcagt 
tggtgagaca 
ggtgactctc 
acctcctcca 
tcaagaattt 
tcctactctc 
cagtaaagat 
agatgactat 
gacaaacatc 
caatggaaac 
atttgtacaa 
taagaaagta 
ggagtatgaa 
tatccaagtt 
ttcctatgaa 
ggaagataat 
atcagggcca 
cccagaaaaa 
actacataag 
ctttgatgaa 
atcctcagaa 
gcctggcctg 
ctcccaagac 
acagcaccag 
ggcatcaaaa 
gatgcaaacg 
tggtatcata 
attagcaaga 
agaatttgcc 
gatccagacc 
agcttacagt 
gatgtatttt 
tattgtggct 
attggaactg 
aaagatagaa 
ctgggaaccc 
ggcaaacaac 
aattataaca 
ccactacagt 
caagattttt 
aatcatttcc 
cctggaactc 
actctttaag 
aacagttttc 



gaccataatc 
cagacagtgc 
cactctactt 
tatgaccgaa 
catgaagtct 
ctcagccaga 
cagagaaacc 
aecctttctc 
tctccagaac 
gacctcagcc 
agccatatga 
atgcccattt 
ctctctccag 
ccagacccca 
ctcagtcaga 
caaattcccc 
gatctttccc 
tctccagaca 
gaccttgatc 
aatgagtctt 
aatgatactt 
ggtacagatt 
gctgaaattg 
aactcctcca 
agaagaaatt 
agggaaacag 
gtttttcgaa 
gagcatctcg 
cgttttaaaa 
aaatcatcag 
gctgttcagc 
gaaagtcctg 
gatattcact 
gacagcaaca 
aagaagagct 
atgaaaaaat 
aaaatgtatg 
attcacgtgg 
ttaggggtct 
cctggctggt 
ccatttctta 
tctgattcac 
ttaaacaatg 
tctaaacctt 
caaggtgcca 
tccaaccaga 
aatggcaatt 
agatatatta 
caaggttgtg 
aacaagcaaa 
ttccgtgccc 
aataagcagt 
cagggctgca 
gagcagggag 
gaaggaaata 
aggtttatcc 
tttggctgtg 
acctcaaacc 
cactatttct 



agaattcctc 
ccccagagga 
cagaccccag 
gtcacaagtc 
ggcagacagt 
caaacctctc 
tttccccagc 
cagacctcag 
tcagtcagac 
atacaaccat 
ctctctctcc 
ctccagacct 
aactcagtca 
gccatacaac 
caaacctttc 
ttaccccaga 
caaactttgg 
tcagtgacac 
agatattcta 
ttccttatcc 
ttctatcaaa 
acattgagat 
attatgtgcc 
gagatcctga 
attacattgc 
atattgaaga 
agtacctcga 
gaattcttgg 
atttagcatc 
agggaaagac 
caaatagcag 
gctctgcctg 
caggcttgat 
tgcctgtgga 
ggtactatga 
cccatgagtt 
agcaagagtg 
ttcactttca 
ggccccttct 
ggctcctaaa 
tcatggacag 
agatcaaggc 
gtggatctta 
ggatccaggt 
aacactacct 
tcaactggca 
cagatgcctc 
ggatctctcc 
aggtaaatgg 
tcacagcttc 
gtctgaatgc 
ggctagaaat 
agtctctgtc 
tggaatggaa 
ctaataccaa 
gtgtcattcc 
atatttacta 
atttagaatg 
ctttcttttc 



aaatgacact 
acactatcaa 
tcacagatcc 
cttccccaca 
catctctcca 
tccagacctc 
cctcggtcag 
ccatacaacc 
aaacctttct 
ttctctagac 
agaactcagt 
cagccataca 
aacaaacctt 
cctttctcta 
cccagacctc 
cctcgaccag 
tcagatgtcc 
cacccttctc 
cccttctgaa 
agaccttggt 
ggaatttaat 
cattccaaag 
ctatgatgac 
caacattgca 
tgctgaagaa 
ctctgatgat 
cagcactttt 
tcctattatc 
cagaccgtat 
ttatgaagat 
ttatacctac 
tcgggcttgg 
aggtcccctc 
catgagagaa 
aaagaagtcc 
tcacgccatt 
ggtgaggtta 
cggccagacc 
gcctggttca 
cacagaggtt 
agactgtagg 
ttcagagttt 
taatgcttgg 
ggacatgcaa 
gaagtcctgc 
gatcttcaaa 
tacaataaaa 
aactcgagcc 
atgttccaca 
ttcgtttaag 
ccagggacgt 
tgatctactc 
ctctgaaatg 
accatacagg 
aggacatgtg 
taaaacatgg 
gaattgaaca 
ggcaatgtat 
tattagtgaa 



ggtcaggcaa 
acattcccca 
tcttctccag 
gatataagtc 
gacctcagcc 
agccacacga 
atgcccattt 
ctttctttag 
ccagccctcg 
ttcagccaga 
cagacaaacc 
aecctttctc 
tccccagccc 
gacctcagcc 
agtgagatgc 
atgacacttt 
ctttccccag 
ccggatctca 
tctagtcagt 
cagatgecat 
ccactggtta 
gaagaggtcc 
ccctacaaaa 
gcatggtacc 
atatcctggg 
attccagaag 
accaaaegtg 
agagctgaag 
tctctacatg 
gactctcctg 
gtatggcatg 
gcctactact 
ctaatctgcc 
tttgtcttac 
cgaagttctt 
aatgggatga 
cacctgctga 
ttgctggaaa 
tttaaaactc 
ggagaaaacc 
atgccaatgg 
ctgggttact 
agtgtagaaa 
aaggaagtca 
tataccacag 
gggaacagca 
gagaatcagt 
tataacagac 
cccctgggta 
aaatcttggt 
gtgaatgect 
aagatcaaga 
tatgtaaaga 
ctgaaatcct 
aagaactttt 
aatcaaagta 
ttcaaaaacc 
tttacgctgt 
taaaatttta 



gctgtcctcc 
ttcaagaccc 
agctcagtga 
aaatgtcccc 
aggtgaccct 
ctctctctcc 
ctccagacct 
acctcagcca 
gtcagatgee 
caaacctctc 
tttccccagc 
tagacttcag 
teggtcagat 
agacaaacct 
ccctctttgc 
ctccagacct 
acctcagcca 
gecagatate 
cattgettet 
ctccttcatc 
tagtgggcct 
agagcagtga 
ctgatgttag 
tccgcagcaa 
attattcaga 
ataccacata 
atcctcgagg 
tggatgatgt 
cccatggact 
aatggtttaa 
ccactgagcg 
cagctgtgaa 
aaaaaggaat 
tatttatgac 
ggagactcac 
tctacagctt 
acataggegg 
atggcaataa 
ttgaaatgaa 
agagagcagg 
gactaagcac 
gggagcccag 
aacttgeage 
taatcacagg 
agttctatgt 
caaggaatgt 
ttgacccacc 
ctacccttcg 
tggaaaatgg 
ggggagatta 
ggcaagccaa 
agataaegge 
gctataccat 
ccatggtgga 
tcaacccccc 
ttgeactteg 
cctggaagag 
gttaaatgtt 
tac 
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LOCUS HUMLPL 3549 bp mRNA PRI 08-AUG-1995 

DEFINITION Human lipoprotein lipase mRNA, complete cds. 

ACCESSION M15856 

NID gl87209 

KEYWORDS lipoprotein lipase. 

SOURCE Human adipose tissue, cDNA to mRNA, clones LPL [35 ,37,46] . 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chorda ta; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 3549) 

Wion,K.L., Kirchgessner,T.G. , Lusis,A.J., Schotz,M.C. and 



REFERENCE 
AUTHORS 

Lawn , R . M . 
TITLE 
JOURNAL 
MEDLINE 

COMMENT 



Human lipoprotein lipase complementary DNA sequence 
Science 235 (4796), 1638-1641 (1987) 
87149101 

Draft entry and clean copy sequence for (1] kindly provided by 
R.Lawn, 18-MAY-1987. 
Several mRNAs ended at around position 2416. 
FEATURES Location/Qualifiers 
source 1 . .3549 

/organism="Homo sapiens" 
/ db_xr e f = ■ t axon : 9 6 0 6 " 
/map="8p22" 
mRNA <1..3549 

/gene="LPL" 

/note=" LPL mRNA (alt.); GOO-120-700" 
mRNA <1..3154 

/gene="LPL" 

/ not e=" LPL mRNA (alt.); GOO-120-700" 
gene 1 . .3549 

/gene="LPL" 
sig_j?eptide 175 . . 255 

/gene="LPL" 

/note=" lipoprotein lipase signal peptide ; GOO-120-700" 
CDS 175.. 1602 

/gene="LPL" 

/note=" lipoprotein lipase precursor" 
/codon_start=l 
/db_xref="GDB:GOO-120-700" 
/db_xref="PID:g307138" 

/ translation= "MESKALLVLTLAVWLQSLTASRGGVAAADQRRDFIDIESKFALR 

TPEDTAEDTCHLI PGVAESVATCHFNHSSKTFMVIHGWTVTGMYESWVPKLVAALYKR 

EPDSNVIVVDWLSRAQEHYPVSAGYTKLVGQDVARFINVmEEEFNYPLDNVHIJ^GYSL 

GAHAAGIAGSLTNKKVNRITGIJDPAGPNFEYAEAPSRLSPDDADFVDVI^FTRGSro 

RSIGIQKPVGHVDIYPNGGTFQPGCNIGEAIRVIAERGLGDVDQLVKCSHERSIHLFI 

DSLLNEENPSKAYRCSSKEAFEKGLCLSCRKNRCNNLGYEIN^ 

QMPYKVFHYQVKIHFSGTESETHTNQAFEI SLYGTVAESENI PFTLPEVSTNKTYSFL 

IYTEVDIGELLMLKLKWKSDSYFSWSDVWSSPGFAIQKIRVXAGETQKKVIFCSREKV 

SHLQKGKAPAVFVKCHDKSLNKKSG " 
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variation 



variation 



variation 



mat_peptide 256.. 1599 
/gene="LPL' 

/note=" lipoprotein lipase; GOO-120-700" 
1611 

/gene="LPL" 

/note="g can be a; GOO-120-700" 
/replace="a" 
2743 

/gene="LPL" 

/note="t can be c; GOO-120-700" 
/replace= "c" 
2851 
/gene='LPL" 

/note="a can be g; GOO-120-700" 
/replace= "g" 

BASE COUNT 1020 a 739 c 806 g 984 t 

ORIGIN Unreported. ~ 

1 cccctcttcc tcctcctcaa gggaaagctg cccacttcta gctgccctgc catccccttt 
61 aaagggcgac ttgctcagcg ccaaaccgcg gctccagccc tctccagcct ccggctcagc 
121 cggctcatca gtcggtccgc gccttgcagc tcctccagag ggacgcgccc cgagatggag 
241 o™^^ Cgctcgtgct 9actctggcc gtgtggctcc agagtc?gac cgcctcccgc 
241 ggaggggtgg ccgccgccga ccaaagaaga gattttatcg acatcgaaag taaatttgcc 
361 ctgaagacac agctgaggac acttgccacc tcattcccgg agtagcagag 

361 tccgtggcta cctgtcattt caatcacagc agcaaaacct tcatggtgat ccatggctgg 
421 acggtaacag gaatgtatga gagttgggtg ccaaaacttg tggccgccct gtacaagaga 
til ccaatgtcat tgtggtggac tggctgtcac gggctlagga gcattaccla 

541 gtgtccgcgg gctacaccaa actggtggga caggatgtgg cccggtttat caactggatg 
601 gaggaggagt ttaactaccc tctggacaat gtccatctct tgggatacag ccttgglgc? 
4\ T^ gCt9C l 9 gcattgca 99 aagtctgacc aataagaaag tcaacagaa? tactggcctc 
ill S»™^ 9 gacctaac " tgagtatgca gaagccccga gtcgtctttc tcctgatgat 
781 gcagattttg tagacgtctt acacacattc accagagggt cccctggtcg aagcattgga 
Inl f^ Ca9aaaC cagtt 999ca tgttgacatt tacccgaatg gaggtacttt tcagccagga 
111 r?lat^lr,t gagaagctat ccgcgtgatt gcagagagag gacttggaga tgtggaccag 
961 ctagtgaagt gctcccacga gcgctccatt catctcttca tcgactctct gttgaatgaa 
1081 ? aaa ^^ a gtaaggccta caggtgcagt tccaaggaag cctttgagaa agggctctgc 
iifi 9 9 9aaagaaccg ctgcaacaat ctgggctatg agatcaataa agtcagagcc 

1501 aaaa9aagca gcaaaatgta cctgaagact cgttctcaga tgccctacaa agtcttccat 
1261 ^^r? 33 agattcattt ttctgggact gagagtgaaa cccataccaa tcaggccttt 
ga ? atttCtc ^atggcac cgtggccgag agtgagaaca tcccattcac tctgcctgaa 
13 81 c a Z= ataagaccta ctccttccta atttacacag aggtagatat tggagaacta 
1381 ctcatgttga agctcaaatg gaagagtgat tcatacttta gctggtcaga ctggtggagc 
1501 tcgccattca gaagatcaga gtaaaagcag gagagactca gaaaafgg?g 

1561 nlntl ^ 9 c 5 aggga 9 aa agtgtctcat ttgcagaaag gaaaggcacc tgcggtattt 
lltl ?=H a 9CC atgacaa 9tc tctgaataag aagtcaggct gaaactgggc gaatctacag 
llll aa " aagaa ^ 93catgtgaa ttctgtgaag aatgaagtgg aggaagtaac ttttacaaaa 
17I1 ~ « flt g tttggggtg tttcaaaagt ggattttcct gaatattaat cccagcccta 
i sni 2: 5°! ta ? ttatttta 9g agacagtctc aagcactaaa aagtggctaa ttcaatttat 
iRfii 99 ? 9 ^^ 9 ^ 9gccaaatag cacatcctcc aacgttaaaa gacagtggat catgaaaagt 
1861 gctgttttgt cctttgagaa agaaataatt gtttgagcgc agagtaaaat aaggctcctt 
llll ^ 9 i????^ attgggccat agcctataat tggttagaac ctcctatttt aattggaatt 
2o!i ^ 99 f^5" cg 9actgagg ccttctcaaa ctttactcta agtctccaag aatacagaaa 
2101 9 ^ 9 ^ a ^? aa tcagactca t ctacacagca gtatgaatga tgttttagaa 

llll «™™ C ttgctattg 9 aatgtggtcc agacgtcaac caggaacatg taacttggag 
2221 SSSS!!? aaag ?9tctg ataaacacag aggttttaaa cagtccctac cattggcctg 
2281 ™ aagttacaaa ttcaaggaga tataaaatct agatcaatta attcttaata 

llll SS?iiS!S tttattgctt aatccctctc tcccccttct tttttgtctc aagattatat 
2401 » a ^ aa ^ 9 " ctctg 9gt aggtgttgaa aatgagcctg taatcctcag ctgacacata 
24 61 ^cagaaaaa aaaaagatac cgtaatttta ttattagatt ctccaaatga 

lill aB « S? n" aaaatCa ttcaatatct gacagttact cttcagtttt aggcttacct 
2581 n2X££E» tcagttgtac ttccagtgcg tctcttttgt tcctggcttt gacatgaaaa 
llll ttttttltJ agttcaaatt ttgcattgtg tgagcttcta cagattttag acaaggaccg 
2101 gtaaaagggt ggagaggttc ctggggtgga ttcctaagca gtgcttgtaa 

276? a ^r»™» gcaatgagc c agatggagta ccatgagggt tgttatttgt tgtttttaac 
2821 ?c ?™ I a «! a9 ^ 9a acaactactt ataaactaga tctcctattt ttcagaatgc 
2821 tcttctacgt ataaatatga aatgataaag atgtcaaata tctcagaggc tatagctggg 
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2881 aacccgactg tgaaagtatg tgatatctga 

2941 gtccttcagc ataattcgga agggaaaaca 

3001 gagtagaaat tgttcctgat gtgccagaac 

3061 gcctataaat agtaggacca atgttgtgat 

3121 ctaaaaataa aatgatgtat gatttgttgt 

3181 ctggatttgg gttgtgaccc agggtgcatt 

3241 gcactgggaa ctctggctcc gaaaaacttt 

3301 cattttattt attagctgta aatacatgtg 

3361 gaaaggtcat tgtggctatc tgcatttata 

3421 tcagtgatgg tctcacagag ccaactcact 

3481 agaaacgtac ttaactgtgt gaagaaatgg 

3541 tattaccac 



acacatacta gaaagctctg catgtgtgtt 
gtcgatcaag ggatgtattg gaacatgtcg 
ttcgaccctt tctccgagag agatgatcgt 
taacatcatc aggcttggaa tgaattctct 
tggcatcccc tttattaatt cattaaattt 
aacttaaaag attcactaaa gcagcacata 
gttatatata tcaaggatgt tctggcttta 
tggatgtgta aatggagctt gtacatattg 
aatgtgtggt gctaactgta tgtgtcttta 
cttatgaaat gggctttaac aaaacaagaa 
aatcagcttt taataaaatt gacaacattt 
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LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



PRI 14-OCT-1994 
cds, and Alu and Kpnl 



liver specific; thrombin. 



Metazoa; Chorda ta; 

Hominidae ; Homo . 



misc_feature 



repeat_region 
protein_bind 



repeat_region 



repeat_region 



repeac_regxon 
protein_bind 



repeat_region 



repeat_region 
repeat_region 



HUMTHB 26928 bp DNA 

Human prothrombin (F2) gene, complete 
repeats . 
M17262 M33691 
g558069 

Alu repeat; Kpnl repetitive sequence; 
Human DNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; 
Vertebrata; Eutheria; Primates; Catarrhini; 

1 (bases 6128 to 26928) 
Degen, S.J. and Davie, E.W. 

Nucleotide sequence of the gene for human prothrombin 

Biochemistry 26 (19), 6165-6177 (1987) . 

88077877 

2 (bases 1 to 6667) 

Bancroft, J. D. , Schaefer,L.A. and Degen , S . J . 

Characterization of the Alu-rich 5' -flanking region of the human 
prothrombin- encoding gene: identification of a positive cis-acting 
element that regulates liver-specific exoression 
Gene 95 (2) , 253-260 (1990) 
91065538 

3 (bases 1 to 26928) 
Degen , S. J. 

Direct Submission 

Submitted (22-SEP-1987) S.J.F. Degen, Division of Basic Science 
Research, Children's Hospital Research Foundation, Cincinnati, OH 
45229-3039, USA 

Location/Qualifiers 

1. .26928 

/organism= "Homo sapiens" 
/db_xref = " taxon : 96 06 ■ 
/tissue_type= "placenta" 
/clone="L[14,25,33, 36,81) " 
/ c 1 one_l ib= " Lambda - 1 0 " 

/map= ,, llpll-ql2; 24 bp upstream of Ncol site" 
405. .511 

/note='MER sequence" 
563. .838 

/note="Alu repeat" 
725. .731 

/bound_moiety= " Apl " 
842. .1136 
/note="Alu repeat" 
1148. .1344 
/note="Alu repeat" 
1814.. 2070 
/note="Alu repeat" 
2052.. 2059 
/bound_moiety= " Apl ■ 
2577. .2870 
/note="Alu repeat" 
3122.. 3415 
/note="Alu repeat" 
3804. .4087 

repeat" 



repeat_region 
repeat_region 
repeat_region 
protein — bind 



/note="Alu 
4210. .4511 
/note="Alu repeat" 
4553. .4793 
/note="Alu repeat" 
4901. .5201 
/note="Alu repeat" 
4957. .4962 
/bound_moiety= " Spl " 
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protein_bind 

repeat_region 

protein_bind 

protein_bind 

protein__bind 

protein_bind 

misc_feature 

exon 

sig_peptide 
gene 



CDS 



5084. .5091 
/bound_moiety= D Apl ■ 
5231. .5443 
/note="Alu repeat" 
5231. .5238 

/bound_moiety="EBP 20° 
5711. .5716 
/bound_moiety= ■ Spl " 
5723. .5730 

/bound_moiety="EBP 20* 
6047.. 6054 

/bound_moiety="EBP 20" 
6198. .6237 
/ note= "MER sequence" 
6544. .6653 

/ no te= "prothrombin precursor" 
/ number =1 

join(6575. .6653,7040. .7089). 
/gene=°F2° 

join(6575. .6653,7040. .7200,7860. . 7884 , 8127 . . 8177 
i2n?S' "i??«'J«? 6 " • 10842 ' 1 3181. .13495,13820. .13948, 
'oiioS'ifll 7 - - 15484 ' 15 982. .16155,16698. .16879, 
26327. .26397,26544. .26687) 
/gene="F2" 

join(6575. .6653,7040. .7200,7860. .7884,8127 8177 
10504. .10609,10706. .10842,13181. .13495,13820. .13948 

26327. .26397,26544. .26687) 

/gene="F2" 

/note= "precursor ■ 

/codon_start=l 

/products "prothrombin" 

/db_xref="PlD:g339641" 

/translations "MAHVRGI^LPGCIAIAAI^SLVHSQH^ 

OTFLEEVRKGNLERECVEETCSYEE1AFEALESSTATDVFWAKYTACETARTP 

CLEGNCAEGIX3TNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNP 
DSSTTGPWCYTTDPT^^ 

QQYQGRI^VTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPTC 
AGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEA 
DCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCG 
ASLISDRVA^TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIH 
PRYNWRENI^RDIAMKLKKPVAFSDYIHPVCLPDRETAASLLQAGY 
ETWTAWGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEG 
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BASE COUNT 
ORIGIN 

61 ?t?»^ a cc 9^«ctg accacatata atttttatta attataatgt tgaaagtccc 
121 tatactaaat ttaaar^ " attca «<= ctggtaggtc atttttaa?g atttgatgta 
181 «m^ a ttggatgctt cttgctacag ggcaaagacg ctaataagat tttgctggag 
111 a~ S^f aa9t<= aatcca 99ca gtgtctatag ctgctgaacc caafatcaga 
301 aa ?r™ff tatcaaagct cttctgtcct gatttgcaac tttagtagtg caagaaaaaa 
361 a taa« a ! aa "f 333 " 99 g taccgttca gagaccttta gagactgcaa ggcltcacag 
361 atgataaaaa gctccatctc tagacgtgtt caggagtggg ttggggcttt qaccttaact 
421 agctgcatca acttggacaa gtcacttcgc ttcccEgtgc ctllgcttcc fcatccataa 
541 tlltlllltt a ? tatagtac "acctcata agtcctgcct acccagcaca tg^tgagcaa 
601 at a nr aa3 f gtaggccta 9 tccctataat cccagcactt ttggagaaca aigtlgggga 
661 mtrll g 9ccaggagtt ccagaccagc ctggccaaca taicgagact gtitttllat 
121 o a atc aaaaa aaaaaaatac «aagcttgg tggtgcaggc ctgtaglccc ggctacttjg 
ill acSL? a^ ? a ?9 a 99att gcttgagccc aggagttcaa ggttgtagta agctatga?? 
Ill ggccaagcac 9gcgacagag catgaccctg tctccaaaaa tataaaltta 

901 flact?,?^ agtggttcat gcctgtaatt ccaacatttt gggaggccaa ggcaggtgga 
96^ taaaaftaca aaS?^ 9 " cgagacca 9<= ctgggcaaca aggcalaatc cSgtccctac 
1021 agctaaaoea anttt tf C ° f 992 ^^ gtacacgcct gtaatcccag ttactgggga 
1081 gccactgcac tacaocct™ "gaacccgg ^^ggcgaagg ttgcagtgag ccaagafcgt 
1141 attaatclat »»»™^ B g agacagagc gagactcgat ctcaataaat aaataaatta 
1201 aaaaacta™ aaaaaaataa gttgggcatg gtggcacctg cctgtagtcc aagctactca 
1261 ffcgccacla clc?^" 3 ^ ca " tga9C <=aggagttct aggctgcagt gagctattat 
1321 gtcfcaaaat aaaa^ 9 ? tgctg tatgt actccagcct gggcaacaga gtgacaccct 
1381 ?tatc a339 £ 3a3 ? t33aat aaaaat taaa aaacaaatta ctaaattgta cttaacagta 
1441 caoata aa? »o^ Ct333 tag 9 a 99 a ^ a ggcaaaatta agggacttaa catgtgccct 
1501 cStetrr £££ ""^ ag 9 cca g«t cacccgcaca gtagttctgt actgtlggtg 
1561 acaaaaotra S^ 30 " tat 99cccag tgaggccgta ctctaccaga atgicafggg 
i«i a = aa 99gttg ggagaggcaa aagtgctggt ctgaagcagg agtctgggtt tccatcctag 
llll ataaaac 3 ^ aatt ^9tat gaccgtgccc cctccatt?? ctccafgacc acatagagal 
HA taactcctct t??^? 33at " atgattcc cagtcttggc tctatcltgg aaccactfgc 
1801 gactcatact 1 " tggatcccat atttttaaag atttttacta aatagaaatt 
1861 J "tccaagct ggagtgtggt ggcatgattt cagctcactg caacctccgc 

1921 caa 9tgattc tcctgcctca gcctcctgag tagctgggat tataggtgct 

llll SSSS? BQCtaatttt tttgtatttt tagtagagac agaatttcac catgllggcc 
2o2 allttaca™ ^f 3 "?" 9 acctcaagt 9 atctgctlac cEcagcctcc caaagtgctg 
21oJ aaactact?! aat-?^ 30 ^ atgccca9C cgcttactca cattttctag tcaaaacaga 
2161 ««S ^!^ 3 ^ 9tC tgca 9 a agag caaaaaaaaa aaaagaaata aaaaattgaa 
2221 aotcoaafaa S^" 93 ? 3 aaaacataa 9 attattcacc acctaaagag aaaaaatftc 
2281 tftaacaall K? 33 3 " cattttCgtc "aataaggc aaattcacaa tttttgaggt 
2341 ^^« 333 ta 5 at 9 ca 9 a aagacaaggc caccccgtag aacgtgcaca cagccctagg 
llil actlf aaa t? ^alS^" 3 ataatatctg gtctttcttt gaglcltgaa atfctctaac 
2461 taaca?taar ?? 33C3taat tttact 9ttt tcagtggtta tagagatttg ctttacaatt 
252J ctgcccccla ctt™^ gat " tgttt gacgccaact tgttggcagg aatgcacccc 
2581 oaSS ^" 9ttatg g cc ttgctcc tatagggcaa gaatatctgc tttaaggccg 
2641 ?^^? 9 99 ctca 99 c ctg taatcccagc actttgaggg gccaaggcgg gcagatcacc 
llil taalaaaaa? taactSa?, 30 " 9 « Cggcc ■W»tS B tS aatccfgtlt ctactaaaaa 
2761 aaacaaoaaa «^»^?? 9t9 ^^aca cacctgtaat cccagctatt tgggaggccg 
2821 tgcactccaa 9M ccca 99 a ggc ggaggttgcg gtgagccgag attatlccal 

2881 tftgctttaf a^?^ agagcaa9a t tccgtctcac acacaaaaaa tatatatatg 
2941 ataatacca 3 ?r" 9C39gC ^"tgtg ctgaacggca ggaatgccaa acttggctgc 
3001 aaooactt 33 tint?? ° Ct caga 9tccca aggagaacaa acagttggtt cctggaggct 
3061 Ia?o«aa?a c^ 39 ^" 1 " gaagactaa 9 catgtgctgg gtccattgtt gtcc£gcLc 
3121 ctt?ttt?t? S MC ctaacctata tttaagtgtt tttgtttgtc caaaaaatgt 
3181 tgggagtcaa gagtcttgct ctgttgccca ggctggagtg cagtgacacg 

llll caaataacta aSan^? 0 " ccgcctccc 9 ggttcaagct Ittc??ctgf ctcagcctcc 
caaatagctg agactatagg cacgcacatc catgcccagc taattttttt atttttagta 
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3m c ?arS ' sgccaggtc QStcttgaac ccctgtcctc aagtgatcca 

3421 "Ccccaaag tggtgggatt gcaggcatga gacaccgcgc ccggcctgcc 

llil "gtcccttc ttaaaatgag ttgtccattt gtaagccgct gatttctttg ggacattgtc 
3481 tccgtaaact tttcataaag catcagtgat ttcaccattc ttccacccaa gcttcaccgc 
llil ccc t^ "^"cttg cttcaatttc agcagaattc atttagctct gataagggct 
llil rfrt ^ tgatgtctta tccttcttag tgcctcaaac tacatcctgt Ecactcatgt 
llil SSfSSl tagtgtgagt ttattttggt gcacaaaaat ttttttaaat ccatgcagtc 
llil Illicit ata ^ cat " tccat 9 aact tttcgaagac cccttgtaga tgtcEgtfgt 
llll f n cagtttacag taattttttt ttttttttga gatgaagtct tgctctgtcg 

llil aocalftt?? c?^ att9 ? cacactctc 9 sctcactgca acclctgcct cctgggftca 
3961 tlactaattt a ^ cc " c 9 a 9 ta Qctgggatta caggtgtgtg ccaccatgcc 

4051 ntf^ttltl atgtgttttt agtagagacg gggtttcact atgttggcta ggctggtctc 
toll mt^lT ccttgtgatc g^ccgcctc ggcctcccaa agtatEggga E?aclggcgt 
till ?^ t f ttB cactt 9S cct acagcaattt tatagcagcc taggctalga tagccatttc 
ilol ™' a9 aat 9 tca «t actgaacagg cctgcaactg tglgtaaalg tclgcaaaga 

4201 ggccgggcag tggctcatac ctgtaatccc agcactttgg ggggccgagg caggtggatc 
22 ttttlclllt t^ tCga gaccagcctg accaacatgg fgafacccca tcfctactaa 
4381 acta^rn™ " a «« t 9«c gtggtagtgc atgcttgtaa tccctagcat gcacttggga 
gc ^ acttg 99 aggctgaggc aggagaatca cttgtactca ggaggccgag gttgcagtga 

«S Icafaacaaa SL? CtCCtttctg ^tgacagag fgaglctcca ?«laaaala 
«aui acaaaacaaa acaaaacaaa aacaaacaaa aaaacccaac aggtaggtao caatoattca 
till IttllllT ccccacttCg WWctaaag tgggcagatc acctgfggtl ag?ag?tcac 
4681 ggcaacatgg tsaaactctg tctctacaaa aatalallaa tllglcaggc 

47?1 cctaaSS? ao 9 ™?,^ tcca * ctatt cgggaggctg aggcaggaga atcgctCgfa 
4801 agaggttgca gtgagccgag ttcacgctat tgcactccag cctccatctc 

4!°! tttattttat tttatr^ aaacatatat tataatttta ttttattta? tcaattttat 
4921 nl^t C " tat tttatttttc taggaacagg tctcattcag gccaggcatg gtgctcacgc 
till tK^? a gcacttggg aggccgaggt ggaggtgggc ggatcacctg Iggtcagglg 
5041 I cctggtcaat Qtggcgaaac cccatctcta ctaaaaatac alLatfagl 

llil ctloaacloa SS" ^ gtaattc <= a gctacttggg atactgagtc aggagaatla 
MfiT 98 ga 5 at 59 aaa ttgcagtgag ccgagattgt tccactgcac tccagcctgg 

llil S^!" C gagactcc g t ctcaaaaaaa aaaaaaaaaa agaaagiaag aaagalagfa 
5281 aS™^ tcttactct 9 ttacccaggc tggagtacag tggtgcaacl atagctcact 
5281 gcaggcatgc accaccattc ccagctaatt tttaattttt tttggtagag atgagggtct 
=341 tgctatgttg cccaggctgg tctcaaactc ctggcctcaa gcgafcclgc ca?5?cggcc 
All «-«^ 9 ttgggattac aagtgtgagc cactatgcct ggcctaaala tatatafacg 
ItH otttttcta 3 a ? aaatg " C « tccca 99 aa "aaggtgtt tgcgggagtc ctggtcccca 
5581 £™™ caacactccc tgttcccaca catgacctgg cccagacccc aaacagccag 
3581 gcccaaagga caggtgaggc gaggcgagaa cttgtgcctc cccgtgttcc tgctctttgt 
5?£ a S «« ta 9 a « aatatttgcc ttgggtactg caaaciggaa afggg^agg 
lill gacaggagta SSgcggaggg tagggtagga ccagaagcct ctctaggcct gccaEggggc 
5821 t™ 9 f" 9 ggagaag 9 ag ggcccctcag tggagaccca gggatlfcag ?agcSgt 
5881 taal™ a ? ES" aBtCC tgggaggt ga cagaagatag actaaaggcc caagagtccc 
5941 allZvZlZt* """cage agctgccaca cacaaacaca cctccaggca cccEggacag 
Innl gaaggaggag a aatgggccc ctcctccagt ggctgagaag ctggggcaaa tgttggctgt 
6061 S ??^? Ca " cc "ggcgaggg gcaactlccl tcSScaca cltt?latlt 
6121 ttttgatatc tgtgtattat gattatacaa acccccacat tggectatat 

till mttttlA gattaagaac ttacgatatt ccatggacat tccattccta aEctccttta 
till ll™lnt*t caaa g fc atta ttcccattgt atagatgagg aaactgaggc acacagagat 
till Iactaa=c^ » a ? C9CtaCa tgttag g a " cgaaggagct ccagglalgt ctcatlglcc 
till cttt?ttata a ^ "ctcagagg gggagggtgg gagatggggg tgacag^gac 

c^ot c " cccc g t 9 actcctccta gaccatccat ccctgctccc aggaggacct qtcctcccaa 
6421 atggtggaga tggacaggag gactatctac ccacccgtcc ccfcggccct gaccctctgl 
till Zlltlttltt tCCgctgatt tcttcatgtt agttcalcat tacccfglgg ggtcaggala 
llil a cagtgaccca ggagctgaca cactatggcg cacgtccgll gcttgclgct 

6661 SSSSSSf Ct9gccctgg "gccctgtg tagccttgtg cacagccigl ItggLaggg 
6661 agtgctcgca ggctggaaca ggctggagga ctggggtgtg ggcccatggg ctggggtctc 
6721 ctggctggac agagcacaca gagctggccc ctaagtaggt c?cagcccca ggcggccagc 
6841 S 9a9C tCagggctgg aaaglgal?g gccgcttcrc SSSS 

6901 gccSaoo cc fr^" aggggca g fc g taggaggggc acagggggee acatttagca 
6961 nr^JH "ttccacca gcccagactg cctctctcag aagccagcag gggagggtgg 
6961 gettgettea tgcccccaga tggecaagae tgcctgttcc tgaggtcget gttccatolc 
708^ «™ a ^ ccttt acag t gttcctggct cltcagcaag caeggteg" gctccaglgg 
7081 gtceggegag ccaacacctt cttggaggag gtgegcaagg gcaicctlga gcgagagtgl 
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7141 gcggaggaga cgtgcagcta cgaggaggcc ttcgaggctc tggagtcctc cacggctacg 
llll 2™S™ 9 ct 9 c "9ga cggtgccggg gcctcagacc gggcccaact ctagacactt 
7321 t«afl^" 5«agcgagg aacgccacag ccccttcgct gctcacagcc tcatttcaac 
7381 n ^ '" ^5 C ! Cafl ? gctggcaaga QSagcggcct cagcctttcc tgggggtctc 
7381 tgcgcctgga ctgtgtccct gtgcagctcc atgacatggg gaggcctcca cagtcttcag 
7441 acatccacct gccttggagc tctgtgtcca catggcctcc tclgcggcag acEcccacal 
7S«? = a * cc " gag gwtgggact ctggggaggc caccacaagc ccclgggctl aagactcagt 
7.S ^ ctctgcgtcg cctctcctgt ccgtagggac tctglcaggg acccactglc 

ill, """ C " C ccat fCcccc cagcctcttt cagactcggt gtgtgtgttg gaggaactcc 
77!} ^" atCCCCa aatattcttc tccttttgga aacaaaagta ggaaacEctl ccacaaacct 
7801 tact™^ i?f« CCt0C gtgaccag 9 g taaggaaagt gtgaggagga gcataacatt 
7861 a " caaaaca ggagctgccg tagcctcact cccagccctt gtttttcagg 

7861 atgtgttctg ggccaagtac acaggtgagc accgggaagg atttgcccca ggaagggagg 
S« " a ? t9a9ag aattctacc <= agagaatctl ctgc?gcacc SgeStcS 
loll ttt«r»rn« =^ CCCCaC tccttcct tg gtccctccca cctgttcatc caEctttctg 
Bin? ™£ acatcccatc caccctgact ccagctcatc ctggccatac cccaatccca 

till to«a aaa o a ^n 9 " tC ^ ttccagc "9 tgagacagcg aggacgcctc gagataagct 
8221 nett™™™ alt 9 a " t9 agcaacc 9 a = acgggtctgg ggagcaggac atggagggga 
8281 ™c?? a ?S a ^?f 0399 ggtgggttt 9 gagtgtggct ggtggaggcc gaggcigEcc 
83a1 nr^r.^ 9 acatt 9 ctcc cattcctggg gtcaagatgt ctcttCgtac ctggctctgt 
llm fin " ^ 9 = gaacgaat 9 aatgaatgaa tggactaatg aattaatgtt tttttttttg 
MM a9a ^ 9a9 C ^ cgctctgtt Scccaggctg gagtgcagtg gcacgatctt ggctcactgf 
B521 S tcccggattc aagcaattct ctgcctcaac ctcclaagta gctgggat?a 
8521 caggtgctcg ccaccacgcc tagctaattt ttgtattttt agtagagacg gggtttcacc 
fill ' l^f WCtoatett gaactcctga cctcgtgatc cacccacccc ggcctcaaag 
liol a ^ agaagtga g=caccgcgc ctggccatga attcatgttt alggcttca? 

87sl S ? ctgacccga 9 cctctgcccc cacctagtca gagctttgat gatgtcacat 
882^ ™ gctttaggtg tcactgaacc aaacaggaac ccaaaccccc IgcEgctctg 

8881 t fl ?^? c "ccctaag catgccaagg tgtttctagc acccggcctt gcatatgttg 
llll cttcaaatat catcacatct actgaacact ttcctatcct Icaagglctg 

9001 tllzlnrttl . a ^ aCtttt 9 ct 9 a gactt cagggagcac cctccctcct gcactgtgtc 
9061 ^ 9aa ??^^ C ttta 9 cac 9a caaaaatgga actctttgtt tatttataag agcagggtct 
9121 c "aaaa tSf' c " gaacC « tgggctcagg caattctccc atctcagtct 
9181 ttttt 3 ?? 3 ? t???, I 3 gtgtga 9 cca ccatgcctgg ctgccatact ttcatttttt 
«!T ""tttttt tttgaggtgg agtctcactc tgtcgcctag gctggagtgc agtggcgcga 
9301 S?^ "gcaacctc cgcctggcgg ttcaagtga? ?ctlc t gclt tfgcftcc?*- 
9301 agtagctggg attacaggca cacactacca tgcccagcta attttttgta ttttttagta 
lit] gagaCggggt ttcaccatgt tggccaggct ggtcacaaac tcctgacctc aggtgatlca 
948i tta^aat? aca^f ^ tgCtgggatt a <=aggtgtaa tccaltgcgc ccagcctcat 
9481 ttgttaaatt acgtactcaa cagacatttt acaaagttcc tgctacgtgc caggcactat 
llil tctS 9 ^ WWatttta agagaatcaa atacagtctc tgccttcaag gaaltcaaaa 
Pfifii tctcaaaaga gaacaaaaat acaaaatatt aaaatgattg cggccgggtg tggtggctca 
9721 a?cttaacca ae^T" tgggagctga 99tgggcgcc clggcl^gg gfttglgacc 
9781 or™™ a catagtgaa acccccaacc tctactaaaa atacaaaaat tagctggggt 
III, ?^? 9 ^" CaC gc 9 cct 9 taa tcctagctac tagggaggct gatggggaga atttctt^a 
9901 ^ ctgggaagc g aa ggttgca gtgagctgag atcatgccac tgcaclttca gtctcggcat 
till a a ?ctct a S S™f tCM atacataata aataataatt clataagtga ftgcalgaaa 
10021 aaaataactt SSS™^ 0 agaccacagg a aaatgagtg tctggtttgc cagaaaatga 
innai gagatg ? ctt cccaggagag gcagagttct gcctggccta gtgggatgca tggatgaaca 
1W41 ctltS 999 Cattccagtc agaagaaaca atccgtggaa IgalLagag gcatglgaag 
10201 ?a «a a !?2 aCa99t agaccagg 9<= cagttgaaaa ggaccttiaE Sactttttcl 
lolll ctatfoa??a S = a cagaatgg aagctccatg agggcagggc tgtgactgtc 
toi7i atltt 99 ! 9 acgt 9 tact 9 agcacccgac agtgcctgtc atatggtagg cacctagcga 
loll] arZlll 9 ? 9 gccactgttg agtgaatggg agaactgctg gttglagagg aagagggglt 
S SS tlllTallll "£" aCCt gcatgag «g Sgalgtggg? gafagfllac 
10501 caaotafc?« tl 9 .^ I aa gtccccag gctccaaggc tgaccggggt ggggtctccg 
inlfii ^ 3 EH 9 tgct 9 ag 99 t ctgggtacga actaccgagg gcatgtgaac atcacccggt 
lllll artist?* gtgccagcta tggaggagtc gctacccacl EaagEc?gag tgagtgaS 
10SB1 cctcSac Illicit 9 .? « gagaa « g 99 a gcaagcg tacctcalg? tlafclglc? 
10741 Iaafaa???r S*~5~ tCt tccagaatca actccactac ccatcctggg gccgacctac 
10801 alccca^ot tlt-nJ &&C cccgacagca gcaccacggg accctggtgc cacactacag 
10861 ™™f 3 ^ gaggaggca 9 geatgcagca tccctgtctg tggtaggctg ggggcagtgo 
lilt} Sf? ? ? a ccaagccc gggggcttca tggggcctgg cllcctgggl cgggaaccfl 
10921 gaatactggc tacccaggca cagtggctca tgcccgtai? cclagcll?? t^aggc" 
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U?4 8 i Sctltac taaaaatacl E"?? 99 " ^""W tgggccaaca tggcaaaacc 
iiinn ^ cg ^ ctac taaaaataca aaaattgcca ggcgtggtgg tgggcqcctc taatcccaac 

in! ESSE ?2SSSS t9aac " 999 a ^ a5 " 

11221 aaalaaaaa? r^r™?^ tcagcctagg cgacaagagc aaaactctgt ctcaaagaaa 
11281 attorrS g " ggccacc ttcagagctg gcgtcagtca ttcagatcac atctgtgccc 
U341 tccccctc?c cacfcttoac a ? tCa ? ggga c «gagtggg gggalctgcc agcc?cltcc 
11401 cctttctaat cL™ ttccttatgg tctaggctgt ggctcattcc aaacatgcct 
11461 ^ caa 33 c actc ctccctccgg gaagccctcc ctagccattt cagtccacac 

llSS tatta?"cc lllttltatl I* 9 ^ 9 '" tgtgcagttt mem Stgtlat 
11581 S f ;« "ggtgtgtt aagtagctat agccacccct tccctgaggc agaccacaat 

lliSl gggtcaactg acqqaqqtt 9 - aggg " ggca WWOTctg cactcgc?aa tgcgtctgta 

11701 St^r™? ?5 9 gccct 33ctg ggtggctctg attcaaataa tgggtccagc 

11761 aSoo^o tCCCcgttga ggsttgggcc tagatctget ccacgtgcgt tcalgctggg 

nR9i 9 " 9a " Ctg aaa 9aaggta cctgggaaaa ctcttcttat gctgatgaca gacacaqaaa 

lllll cctqcccacq gaaaag ^ gtc "ctgtcctg aaggcctggc ?cagaalagg cacagtcagc 

FF =s ass ssassss 

I ™ 99 ~ ESSE £S a™ 

I S i™ 9 ™ 9 =2=2 ssss 

~9ini cgccaccact cctggctaat ttttttttat gttagtagag acqqqqtttc accatatrao 

12361 Sa at Ct ^ aaactcc tgaccttgtg acccLccgc ctlggcctcc caaactgcS 

12421 aqtctcae? 9 9 ?^ gaggcac tgcgcccagc catttttttt tttttttttc tttgagltgg 

"Si cctcctgqgt tggagtgcag tggcataatc ttggctcact gcalc?tccl 

r.r.Z.^ ggS tca 39cgatt ctctgcctca gcctctcata cagctgggat tacagqeaca 

iSSJ tfgcltgal? S?*"?' tagtagagac WWttStt catg?fggcc 

12661 qa?t^™ tgaactcctt gttccggtga tctgcccagc tcggcctccc aaagttccgg 
12721 9 =" 9gt gtaa 9 cca ct gcgcctggcc cctggtattg gtcttatagc aagtttatcc 
dill laataaatta tactccccaa cccclataca cacgcacala caftjatgat 

12841 aamta 5%-**°? gaaattggcc ^tccaggtg aacagectag tgatccgigc 
12901 cqfa?^? = tgtg = agcC ataaaaacat gactcctcca gcagctccag gcagccacta 
12961 ™ 09 5 ac agatggcc taggaggeca aacctggtta ctatctctgg tttattatot 
dill llZtllZcZl ^ atg «9tat attttgttta atcct«caa caaacctgcf aaagtggclt 
13081 calaqq™ Zl~ S " Ca aacggtca 9 a ageccagaga ggttaagEaa cctgalgtca 
13141 t« a ?^ a ? a aagcagcaag aceggggtte acacccctgt ctgttccggt ccatgtgtgg 
lllil g^gatgactc «cEt£™ ^ttgcccct cacccaccag gcLgga??a agtcac£X 
13261 S?~?»f£ ca C3Ctccga aggctccagt gcgaatctgt cacctccatt ggagcagtgt 
13321 ? 999ggca9ca 3taccagggg cgcctggcgg tgaccacaca Eiggctccfc 

1338} gSacaoc SE^ 9080 a « ggccaa 9 Sccctgagca agcaccagga c??Iaactca 
1*441 9 ^?^? f 9 tg 3 t 3Sagaa cttctgccgc aacccagacg gggatgagga gggcgtgtqq 
nS01 c?qcctqqq? ^ 9 " aa9 ' C tggcgact " gggtac?gcg IcctclacL ?fgtgg?gag 
13561 th 9 ^??!^ agggggcct 9 agttgcaggg acaaatccta gtgggaataa caacagccgc 
13621 tS££ c 9 aa cg«ta cctcattgag tgegctcatt acigccttac agtaaccagj 
13681 99 tcc tgtgc ccatttcaca gataagtaca ctgaggcccc aggaggttal 

13741 9 9 cccaact 9 t 9 catgcacgct taacctctgc accaaatggc clccaaggcc 

lllll q^^ 9 ? 933 ^ tggggggat ctaggggatg ggtgaggaat ggcccagccc agtcccggcc 
lSsi air?-? 93 tcccaaca 3 a ggaggccgtg gaggaggaga Ilggagltgg gctgSIIag 
lllll Icgaggacct ttSSS^S agggCgtacc 3<=caccag?g agScfagal Ltfftcaal 
13981 cSnr^ "ggctcggg agaggcaggt gaggtagtgg gcatccgagg ggatgcgggg 
1404} ctqcqfqcto ftr 3 ^" acttgcccct =ac t gcttgg cttgctctgl agac?g?ggg 
14101 tc?tacltqq gaagtcgct 9 gaggacaaaa ccgaaagaga gctccEggll 

14161 tqtqtcctqa aq™ 9 ^ 5 gtggagggc teggatgeag agateggcat gtcaccEtgg 
14221 accltcaqql 2SSS!?S BC taccattcac tcctgggggc aggtgtgctg ctggaccccc 
14281 tcrr^ a f?^ = ctgcctgca ggcctgggct ttacagatga caacagctga gcatccagga 
Idll tcttaq^qq aqr^^T agccacatga gacgggttgt ttacttcttt Sttttttg 9 ? 
14401 ctacc?cql? « ^ gtcaccta g gctggagtgc agtgctgcaa tctcggc£ca 
14461 cagcctcctq aa o ^ g ^ aacttct gccttccggg Ctcaaacgat tctctEgcct 

14521 tttalataof qlr™? 33 tttaca 9aca tgcgccacca cacccggcta atttttgtat 
14581 atoa?cnarr ? a ^ a999ttt caccat g"g gecaggctgg tcttgaactc ctgacctcaa 
1«2 ggwcatgqq tlettS?^ " CCaaa9Cg cc 9 gga "ac aggcatgagc caccacaccc 
14701 llarnrt?, 9 . 5 , J£ ctttactt ctaagcagat ggtaaagctg agactgaegg agctggtggc 
Itlll cgcfatgtqa Ic^t^ tg99tttgaa tec«ttctt clgatfcclg agctglgcfa 
cgctatgtga actctggact ggaaggacct agttaggggg egcaaaaage aggaggcagg 
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14821 
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14941 
15001 
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tcaggtgcag 
acttgagggc 
aaatgcaaaa 
tgaggcggga 
attgcactcc 
aataaaaagt 
agcccaagct 
gattgttact 
ctgccgcccc 
gggccagcct 
cctgggacaa 
caaggtacag 
catgaggggc 
acatcccagc 
cggggagaat 
tctgaaaata 
aggctgtctg 
tgccatggca 
agaaccccaa 
tctttcttgg 
gatctacatc 
gaagctgaag 
ggagacggca 
tgggcctggc 
gtttcttgga 
ttatttattt 
cttggctcac 
gtagctggga 
tcggccaggt 
gtgccgagac 
tctctaagaa 
ttcttccttc 
aacctgaagg 
gtgaacctgc 
gacaacatgt 
agctgggaga 
cctcttggtg 
gttatgggag 
agtgagctca 
agcaaatatt 
acccagacgg 
aacagatcta 
gagaacggct 
tgaggtcagg 
tacaaaaatt 
ggcaggagaa 
ttgcactcca 
aggtggccag 
gaagagggga 
catggcaggc 
gaggcctggg 
ggacagccac 
ggggcttgag 
ttagtgtaat 
tgtcatccag 
tagtttgaga 
aaaaaaaaaa 
ccggaaattt 
tcggctcact 
gtagctggaa 
acagggtttc 
cctcggcctc 
gaaatttttt 
gcatggtctc 



tggctcaccc 
aggagttcga 
attagccagg 
ggatcgcctg 
agcctgggtg 
gtgaggcagc 
tggatctggg 
tctagggctg 
tcccaggcag 
catcagtgac 
gaacttcacc 
aactggtggc 
cttggtggct 
agtctctgct 
ccgtctgtct 
gagtctgtct 
actccaaagc. 
ggaaccagcc 
gggcaggcag 
ggtctctgca 
caccccaggt 
aagcctgttg 
gccaggtggg 
tctgatacca 
gtgaacccaa 
actgacggag 
tgcaacccca 
ttacaggcta 
tggtctcgaa 
cacaggcgtg 
atggcgttgg 
cccaaagctt 
agacgtggac 
ccattgtgga 
tctgtgctgg 
actgagttgt 
gctcagtttc 
ggttaaatga 
gatagcagca 
tattgagcgc 
acaatgtctg 
aaacagcaat 
gatgaagtgg 
agttcaagac 
agctggtcat 
ttgcttgagc 
gctgggcaac 
agaaggttgg 
aggaaggagt 
actaaggccc 
gggctgagga 
ttcctttagg 
gcaggttaag 
ttcagactca 
attgtcctcc 
gcaaatcatg 
aaaataccca 
ttttttgaca 
gcaacctcct 
tcacaggcat 
accatgttgg 
ccaaagtgct 
tttttttttg 
ggcttacttg 



ccgtaatccc 
ggccagcttg 
tgtagcagca 
agcccaagag 
acaagagtga 
ccctcagcat 
ccccggaggc 
gtgtagaggc 
gtgatgcttt 
cgctgggtcc 
gagaatgacc 
ccgtgggtgt 
ccgggacaca 
ggaaagccat 
ctggtccctc 
ggactagggc 
cctgcacggc 
ctatcccctc 
tttcctgctc 
ggtacgagcg 
acaactggcg 
ccttcagtga 
ccaccagatg 
agtagccttg 
aagttctttt 
ttccactctt 
cctcctgggt 
atttttgtat 
cccctgacct 
aacgtctgtg 
ggccaggcgg 
gctccaggct 
agccaacgtt 
gcggccggtc 
caagtctgtg 
gcctgggttc 
ttcctctgta 
agtagtatat 
agaggctgcg 
ctatcacgtt 
tgccctcaga 
ccctgaccag 
gcttctaaat 
cagcctggcc 
ggtgacgcat 
cagggaggcg 
acagcaagac 
agaaggcctc 
gagcaggcat 
tgaggtggga 

ggggcagcag 

gcctggaagg 
aaatgatgtg 
cagaaaagtt 
attctgtgga 
aatatggttt 
aggatgttct 
tagcttcgcg 
gctcccaggt 
gtactaccat 
ccaggtcggt 
gggattacag 
agatggaatc 
gaattacagg 



agcactttgg 
ggcaaaatgg 
tgtccctgta 
gctgaggctt 
gaccctgtct 
cacacggagg 
agctctgccc 
agccccctca 
tccggaagag 
tcaccgccgc 
ttctggtgcg 
ctggcagggg 
taggatgttc 
ttggtcacgt 
caacactagg 
gtgcagcctg 
tttaggccca 
cctggtggcc 
cttgctgggt 
aaacattgaa 
ggagaacctg 
ctacattcac 
cttgttagct 
caagagcccc 
cagtactggc 
gtctcccagg 
tcaagcgact 
ttttagtaga 
caagtgattc 
cccagccagc 
ctcctgtggg 
ggatacaagg 
ggtaaggggc 
tgcaaggact 
cagggcgggc 
aagccatgtg 
aaatggaggt 
attaatgtac 
ggtagggaaa 
ccaggcagcg 
gagcttcctt 
tgctgtgaag 
agggtggcca 
aacatggtga 
gcctgtagtc 
gaggttgcag 
tccattgatc 
cctgagaagg 
atctagggga 
gcactcttgg 
tgggtgaggg 
actttattga 
actgacttta 
gtaaaaataa 
tgtgtgggaa 
ctttttaccc 
cttatgcaac 
tcacccaggc 
tcaagtgatt 
gcctagctaa 
cttgaactcc 
gcgtgaacca 
ttgctctgtt 
tgcctgccac 



gaggccaaga 
taaaaccccg 
gtcccagcta 
cagtaagctg 
caaaaataaa 
ctccagcccc 
agctgggttc 
tcctcagctc 
tccccaggag 
ccactgcctc 
cattggcaag 
tctgagtcct 
tgtatacccc 
cctgactgag 
atatagccca 
tgcccctgtc 
ggaagaaaca 
tgcaggacac 
gaacctgcag 
aagatatcca 
gaccgggaca 
cctgtgtgtc 
gaggggcaga 
tttccctttt 
gttttatttt 
ctggagtgta 
ctcctgcctc 
gactggtggg 
acccgcctcg 
tctggcgttt 
ggttggctct 
ggcgggtgac 
agcccagtgt 
ccacccggat 
tgagggaaca 
actttgagca 
aaaagtctct 
ttggcatagt 
tgccattcat 
ttctagggta 
cctaggaggg 
aaaaatgaag 
gacaaggtgg 
aaccccgtct 
gcagctactc 
tgagctgaga 
gatcgatcaa 
tgatgtctgg 
ggagcaccgc 
cttgtctggg 
gagagagggg 
gtgagatggg 
aaagtaaaaa 
tacaaagatt 
tttttatata 
ataaatactt 
cacaatacaa 
ttgagtgcag 
ctcctgcctc 
tttttgtatt 
gacctcaggt 
ctgtactcgg 
gcccaggcta 
cacgcccggc 



caggaagatc 
tctctactaa 
ctaaggaggc 
tgactgtacc 
taaataaata 
aaaggcggcc 
ttagacctgg 
ctaatgcttc 
ctgctgtgtg 
ctgtacccgc 
cactcccgca 
ccaaagcgat 
ccagaatata 
gcttggagcg 
tgtgggagtc 
cccgtcctcc 
cccagggggc 
actgtctccc 
cttctccatt 
tgttggaaaa 
ttgccctgat 
tgcccgacag 
agccaagttc 
ccaggcctcg 
ttatttatat 
gttgtgcgat 
agtctcctga 
tttcaccgtg 
gcctcccaaa 
tagattctgg 
cactaggccc 
aggctggggc 
cctgcaggtg 
ccgcatcact 
gtggggccca 
agttgcctaa 
atcccataag 
atcagtcacc 
tcagtcactc 
tacagcaggg 
cacatccata 
cacagggaga 
gcagatcact 
ctactaaaaa 
aggaggctga 
tcgggcatca 
tcaatcaatc 
gcagggactg 
aggctggggg 
gagcagtagg 
ggcaggcaga 
aagttattga 
ataaaaaaat 
tcctgtatac 
tatatatgca 
gagtatttcc 
atattaaaac 
tggcacaatc 
agcctcctga 
tgtagtagag 
gattcacctg 
ccaaaaccag 
gagtgcagtg 
taactttttg 
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18661 tatttttagt atttttagta gtgatggggt ttcaccatgt tggccaggct ggtcttgaac 
18721 tcctgacctc gggtaatcca cccacctcgg cttcccaaag tgctgggatt acaggcgtga 
18781 gcaccagcac ctggcccaaa accaggaaat taatgatgat acaatattat tgtctaatct 
18841 atagacctta ttcaaatttt tgttagtctt gctaatgtct tttataggga aaaaaaaaaa 
18901 aaaaagcgtg tttctcaccc aggattcaat gaaggatctt tctttgtctt ctatgacctt 
18961 gacatgtctg atgagtgcag tctggttatt ttgtacactg gccctgaatc cgggtttgtc 
19021 taaggtttcc tcacggtcag gttcgggctc agtggtgcca tgtccttctt ggtgcatcct 
19081 gttaactggc acatgagaac aatttgtctc atatgtggtg agtctaactc tgacctcttg 
19141 aggaaggcaa tgtctgccaa gtttcttgct gtaacttctg tttttccctt tgtaattaat 
19201 aagaatctgg taaagagaca ctttgatgtt tttttttttt tttttttttg tgatggagtc 
19261 tccctctatc acccgggctg gagtgtgtgg tgcgatctcg gctcactgca acctccatcc 
19321 cccaggttca agtgattctc ctgcctcagc ctcccaagta gcagggatta caggcatgtg 
19381 ccaccacacc cagctaattt ttgtattttt agtagagatg gggtttcacc atgttggcca 
19441 ggatggtctc gaactcctga ccttgtgatc cgtctgcctc agcctcccaa agtgctggga 
19501 ttacaggtgt gagccaatac gcctggccta ctttgatatt ttgtattctg tttgcatcaa 
19561 aaccttctcc caactagggt gactaccaaa tggcacttat ctaattctgt cattccttct 
19621 acatttgtta gttactttat tgctttcctt cctttcattc tatcagtgtg gacttaagga 
19681 tccttacttt attctaaggg ttcacctttt ttttcttttt ttttgagatg gagtttcgcc 
19741 catgttgccc aggctggatg gagtgcaatg gcgtgatctc ggctcactgc aacgtcctcc 
19801 tcccaggttc aagcaattct cctgcctcag cctcctgagt agctgggatt acaggcatgt 
19861 gccaccacgc ctggctaatt ttttgtattt ttagtagaga cagggtttca ccatgttggc 
19921 caggctggtc tcgaactcct gacctcaggt gatccgcccg cctcagcctt ccaaggttct 
19981 gggattataa gcgtgagctc taccgtgcca ggccatactt tgttactact gttatttttc 
20041 ctgatgctca gatgatccca agtttggcct gtggaagtcc cttcaagctg gcttctgtga 
20101 cttggggaga tgttctgtca ttctttgagt actttctttc tttctggcac agcaaaatga 
20161 ttcaggttaa tcctactttc cttactgtag tgttggaacc agccatttct ccagggaacc 
20221 cttgtagtca agagtggaat ttagaactga gatctgggtg ctggcgtgtg cacattgcta 
20281 gtgggatgtc attacttcta ggctctctta gtggacagaa ccagaaaaaa attatatgat 
20341 gcatatacca atatctctat catctatata aaaaaccatg agttcctact gaaacctcca 
20401 attccattct aacaccacag gattaatttt agcttttcct tttccatatt tgtaactctc 
20461 tctgttgaca gtgagaaacc tgaccctcat tatctgtaat gcatttgcct atttgaacaa 
20521 tactagaata tagtttcaaa atcctccatc cataacacta ttaaaaccaa tcctatggct 
20581 gggctcagcc cactgcaacc tctgcctcct ggactcaagc cagcctccca ctttagcctc 
20641 ccgagtagcc agggctacag gcacacacca ccatgcccag ctaatttttg tattttttgt 
20701 agagactggg tctcactgtg ttgcccagac aggtcttgaa ctctgagctc aagtgatcca 
20761 tccaactcag cctcccaaag tgctaggatt acaggtgtga gtcaccatgc ctggc'ctctc 
20821 ctagtaaatt tttagaagtg gtgttgttag gtcaaaaggc aaacatgtat gtcatttttt 
20881 agagattttt aaatttcttt ccataagggt tgtaccagtt tgcatttcca tcacagtgta 
20941 tgagaatgcc tgtttcccca caaccttgcc aaaagaatgt cacagtttaa attttaccaa 
21001 tctgagaggt gagaaatagt acctgaaatt gtttaacgga catcttcaaa ttgaaattga 
21061 ggttgacaac gaatcatagt taggaccttt tttttttttt tttttgagtg ggtctcctcg 
21121 tcaccaagct gagtgcatgg cacgatttgc tcactgcaac ttccgccttc tgggttcaag 
21181 cgattctcct gcttcagcct cccaagcagc tgggactcca ggcgcgagtc accatgcccg 
21241 ctaatttttg tatttttagt agagacaggg ttttaccaga ttggccaggc tggtctcgaa 
21301 ctccttacct tgtgatcctc ccgcctcggc ctcccaaagt gctgagatta caggcatgag 
21361 ccaccacgcc tggcctaagg accattttta tataattttt tttttgagac agagtcttgc 
21421 tttgtcaccc aggctggagt gcaatggtgc aatcttggct cactgcagcc tccacttccc 
21481 tggttcaagt gattctcctg cctcagcctc ccgagtagct ggttccacag gtgcgtgcct 
21541 ggctagtatt tgtattatat aatttttttg tgaattgtct cttcatggtt ttttgcccat 
21601 tttttggtcc ctttcttatc aatttttgtg agttcttcgt atttatatta ggcctttatt 
21661 tgtgatatac attgcaaatg ttttctccta gtttgtcagt ttttttaacc tcatgtataa 
21721 tttttctggc catgcagttt aaaaaattac taggtagtca aatttatcaa tcattattta 
21781 taaatctggt ttgaacagag ataaactttc ctggccaagt gtggtgttta cacctgtaat 
21841 cccagcactc tgagaggctg aggtggggat cacctgaggt cagaagttca agaccagcct 
21901 ggccaacatg gtgaaaccct gtctctacta aaaatacaaa aattagctgg gcgtggtggc 
21961 tgatgcctgt agtcccagct actcaggaga ctgaggctgg agaattgctt gaacctggga 
22021 ggcggaggtt gcagtgagca gagatcgtgc cgctgcactc cagcctgggt gacagagcaa 
22081 gactctgtct caaaaacaaa acgacaaaaa acaacaacag aaaagccttt cctgatagct 
22141 aggtcattga ggaattcact catgttttct tctagtacct gatttcattt ttctgcactt 
22201 agattcctga ctcatatgga gtttattttt gtatctgatg tgaggcatag atctaattta 
22261 ttattttcca aatggctaac tagctgtctc taaacccttt attaaaaatt attggccaag 
llll} tgcggta 9 cc acacctgtaa tcccagcagt ttggaaggct gaggcaggat tgcttgaggc 
22381 caggaattca aaaccagccc agacaacata gcaagaccct gtctctacaa gaaaatattg 
22441 gtcaggtgtg gtggctcacg cctataatcc cagcactttg ggaggctgag gcaggtggat 
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ggagtgtggt 
tcctctctta 
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gtgatccacc 
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tgggttcagg 
aggtagagcc 
agaatgactg 
ggctatgagc 



ggagatagag 
attagctggg 
gaatcatttg 
agcctggcga 
aattagctgg 
gaggattgct 
ccagcctggg 
ggtggctcac 
tcaggagttc 
aattagccag 
agaatcgctt 
cagcctgggc 
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gcctcccgag 
gagagggttt 
cggcttgggc 
ctgttgggtg 
agctcgagct 
gacaagattt 
gaggggtaag 
tatgctcctg 



accatcctgg 
tgtggtggcg 
aacccaggag 
cagagcaaga 
gcatggtggc 
tgagcctagg 
caacagagtg 
gcctgtaatc 
aagaccagcc 
acatggaggc 
gaacctggga 
aacaagagta 
ttgggaggcc 
ggtgaaaccc 
taatcccagc 
tgcagtgagc 
caaaatcaat 
acattctaga 
gctgcccagg 
ccaaactgct 
tttatttttt 
ttacctaggc 
gtgttgggat 
agagtatgct 
tctgtctgtg 
aaggctaatg 
tgtttttcca 
attttattgg 
ttgagtcatt 
tctttcttga 
gtatttgtaa 
aacaaatatc 
agatggagtc 
aacctccacc 
acaggtgcac 
accatgttag 
caaagtgctg 
acagagtctt 
gctccacctc 
aggtgtacac 
gttcgccagg 
tgctggtatt 
ttcattaatt 
tgctgttctc 
ggagtgcagt 
tcttcttgct 
aatttttttt 
atcccggctc 
tcctgagtag 
gtagagacag 
gcccgccttg 
gtgtggtttt 
atacgtgggg 
ctaacttttt 
taattttttc 
cggctcactg 
tagctgggac 
tgccatattg 
atgagccacc 
gagaatagac 
agaagtggtg 
gctaggattg 
tggactctca 
agcacagacg 



ccaacatggt 
catgcctgta 
gtggaggttg 
ctccatctca 
atgtgccttg 
agttcaatac 
agaccctgtt 
ctagcacttt 
tgaccaacat 
acatgtctgt 
gacggaggtt 
aaaactccgt 
gaggtgggtg 
catctctact 
tacttgggag 
cgagatggtg 
caatcaatca 
ttgtatctta 
ctgatttcaa 
gggattacag 
cctgcatcca 
tggtctcgaa 
taccagcatg 
taaggatgag 
tacagcgtca 
ttctctttta 
tatgaacttt 
gattatgaca 
ttatccaagg 
ctgccgcaat 
tatcttattt 
tagggatttg 
tcactctgtc 
tcctgggttc 
gccaccatgc 
ccaggctggt 
ggattacagg 
gctctgtcac 
ccgggttcac 
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atggtctcga 
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tctgcttttt 
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ggcacagtca 
tcagcctccc 
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tgtaggtggg 
agaagggttt 
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ccagctgtgt 
gccgttctct 
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aaaagcaact 
tttttaattt 
ctcactctgt 
ctcccaggtg 
tgccaccatg 
tcttgaactg 
cggtccaact 
caaagaatga 
ggatttgggg 
gtgaggaagt 
ctcgtgaagg 
ttcaaggtta 
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26341 gaagggaaac gaggggatgc ctgtgaaggt 

26401 agcttctcta aagcccaggg cctggtgaac 

26461 tatccagaaa cagttgcctg gcagaggaat 

26521 gaaacctcat ctt'tcttctt cagagcccct 

26581 tctcatgggg tgaaggctgt gaccgggatg 

26641 gcctgaagaa gtggatacag aaggtcattg 

26701 ttctgggctc ctggaaccaa tcccgtgaaa 

26761 tcccaataaa agtgactctc agcgagcctc 

26821 tctgggctca ggaagagcca gtaatactac 

26881 tggtgcacgc tggtagtccg agcactcggg 



gacagtgggg gaccctttgt catgaaggta 
acatcttctg ggggtgggga gaaactctag 
actgatgtga ccttgaactt gactctattg 
ttaacaaccg ctggtatcaa atgggcatcg 
ggaaatatgg cttctacaca catgtgttcc 
atcagtttgg agagtagggg gccactcata 
gaattatttt tgtgtttcta aaactatggt 
aatgctccca gtgctattca tgggcagctc 
tggataaaga agacttaaga atccaccacc 
aggctgaggt gggaggat 
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TITLE 

JOURNAL 
MEDLINE 
FEATURES 

source 



LOCUS HUMPMG3BA 3997 bp mRNA PRI 08-JAN-1995 

DEFINITION Human platelet membrane glycoprotein Ilia beta subunit mRNA, 

complete cds. 
ACCESSION M20311 
NID gl90107 

KEYWORDS cell membrane glycoprotein; platelet membrane glycoprotein Ilia. 
SOURCE Homo sapiens cDNA to mRNA. 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 3997) 

AUTHORS Zimrin,A.B., Eisman,R., Vilaire,G., Schwartz, E. . Bennett, J. S. and 
Poncz,M. 

Structure of platelet glycoprotein Ilia. A common subunit for two 
different membrane receptors 
J. Clin. Invest. 81 (5), 1470-1475 (1988) 
88213696 

Location /Qualifiers 
1. .3997 
/organism="Homo sapiens' 
/db_xref = - taxon : 9606 B 
/ eel l_type= " ery throl eukemia " 
/map= ,, 17q21.32" 
sig__peptide 17 . . 94 

/gene="ITGB3" 
/note=°G00-120-013 M 
CDS 17.. 23 83 

/gene= n ITGB3" 
/codon_start=l 
/db_xref= w GDB:G0O-12O-O13" 
/product^ glycoprotein Ilia" 
/db_xref="PID:gl90108" 

/ trans 1 a t i on = - MRARPRPRPLWATVLALGALAGVGVGGPNICTTRGVSSCQQCLA 

VSPMCAWCSDEALPLGSPRCDLKENLLKDNCAPESIEFPVSEARVLEDRPLSDKGSGD 

SSQVTQVSPQRIALRLRPDDSKNFSIQTOQVEDYPVDIYYLMDLSYSMKDDLWSIQNL 

GTKLATQMRKLTSNLRIGFGAFVDKPVS PYMYI SPPEALENPCYDMKTTCLPMFGYKH 

VLTLTDQVTRFNEEVKKQSVSRNRDAPEGGFDAIMQATVCDEKIGWRNDASHLLVFTT 

DAKTHIALDGRLAGIVQPNDGQCHVGSDNHYSASTTMDYPSLGLMTEKLSQKNINLIF 

AVTENVVNLYQNYSELIPGTTVGVXSMDSSNVI^LIVDAYGKIRSKVELEVR^ 

SLSFNATCLNNEVI PGLKSCMGLKIGDTVSF S I EAKVRGC PQEKEKS FT I KPVGFKDS 

LIVQVTFDCDCACQAQAEPNSHRCNNGNGTFECGVCRCG PGWLG SQCEC SEEDYRPSQ 

QDECSPREGQPVCSQRGECLCGQCVCHSSDFGKITGKYCECDDFSCVRYKGEMCSGHG 

(^SCGDCirDSDWTGYYCNCTTRTDTCMSSNGLLCSGRGKCECGSCVCIQPGSYGDTC 

EKCPTCPDACTFKKECVECKKFDREPYMTENTCNRYCRDEIESVKELKDTGKDAVNCT 

YKNEDDCVVRFQYYEDSSGKSILYVVEEPECPKGPDILVVLLSVMGAILLIGLAALLI 

WKLLITIHDRKEFAKFEEERARAKWDTANNPLYKEATSTFTNITYRGT " 



FIG. 29A 



SUBSTITUTE SHEET (RULE 25) 



WO 99/50454 



PCT/US99/06473 



71/97 

gene 17.. 2383 

/gene="ITGB3" 
mat_peptide 95 .. 2380 

/gene="ITGB3" 

/note="G00-120-013° 

/product* "glycoprotein Ilia beta subunit" 
BASE COUNT 917 a 993 c 1099 g 988 t 

ORIGIN Chromosome 17. 

1 gcgggaggcg gacgagatgc gagcgcggcc gcggccccgg ccgctctggg cgactgtgct 
61 ggcgctgggg gcgctggcgg gcgttggcgt aggagggccc aacatctgta ccacgcgagg 
121 tgtgagctcc tgccagcagt gcctggctgt gagccccatg tgtgcctggt gctctgatga 
181 ggccctgcct ctgggctcac ctcgctgtga cctgaaggag aatctgctga aggataactg 
241 tgccccagaa tccatcgagt tcccagtgag. tgaggcccga gtactagagg acaggcccct 
301 cagcgacaag ggctctggag acagctccca ggtcactcaa gtcagtcccc agaggattgc 
361 actccggctc cggccagatg attcgaagaa tttctccatc caagtgcggc aggtggagga 
421 ttaccctgtg gacatctact acttgatgga cctgtcttac tccatgaagg atgatctgtg 
481 gagcatccag aacctgggta ccaagctggc cacccagatg cgaaagctca ccagtaacct 
541 gcggattggc ttcggggcat ttgtggacaa gcctgtgtca ccatacatgt atatctcccc 
601 accagaggcc ctcgaaaacc cctgctatga tatgaagacc acctgcttgc ccatgtttgg 
661 ctacaaacac gtgctgacgc taactgacca ggtgacccgc ttcaatgagg aagtgaagaa 
721 gcagagtgtg tcacggaacc gagatgcccc agagggtggc tttgatgcca tcatgcaggc 
781 tacagtctgt gatgaaaaga ttggctggag gaatgatgca tcccacttgc tggtgtttac 
841 cactgatgcc aagactcata tagcattgga cggaaggctg gcaggcattg tccagcctaa 
901 tgacgggcag tgtcatgttg gtagtgacaa tcattactct gcctccacta ccatggatta 
961 tccctctttg gggctgatga ctgagaagct atcccagaaa aacatcaatt tgatctttgc 
1021 agtgactgaa aatgtagtca atctctatca gaactatagt gagctcatcc cagggaccac 
1081 agttggggtt ctgtccatgg attccagcaa tgtcctccag ctcattgttg atgcttatgg 
1141 gaaaatccgt tctaaagtag agctggaagt gcgtgacctc cctgaagagt tgtctctatc 
1201 cttcaatgcc acctgcctca acaatgaggt catccctggc ctcaagtctt gtatgggact 
1261 caagattgga gacacggtga gcttcagcat tgaggccaag gtgcgaggct gtccccagga 
1321 gaaggagaag tcctttacca taaagcccgt gggcttcaag gacagcctga tcgtccaggt 
1381 cacctttgat tgtgactgtg cctgccaggc ccaagctgaa cctaatagcc atcgctgcaa 
1441 caatggcaat gggacctttg agtgtggggt atgccgttgt gggcctggct ggctgggatc 
1501 ccagtgtgag tgctcagagg aggactatcg cccttcccag caggacgaat gcagcccccg 
1561 ggagggtcag cccgtctgca gccagcgggg cgagtgcctc tgtggtcaac gtgtctgcca 
1621 cagcagtgac tttggcaaga tcacgggcaa gtactgcgag tgtgacgacc tctcctgtgt 
1681 ccgctacaag ggggagatgt gctcaggcca tggccagtgc agctgtgggg actgcctgtg 
1741 tgactccgac tggaccggct actactgcaa ctgtaccacg cgtactgaca cctgcatgtc 
1801 cagcaatggg ctgctgtgca gcggccgcgg caagtgtgaa tgtggcagct gtgtctgtat 
1861 ccagccgggc tcctatgggg acacctgtga gaagtgcccc acctgcccag atgcctgcac 
1921 ctttaagaaa gaatgtgtgg agtgtaagaa gtttgaccgg gagccctaca tgaccgaaaa 
1981 tacctgcaac cgttactgcc gtgacgagat tgagtcagtg aaagagctta aggacactgg 
2041 caaggatgca gtgaattgta cctataagaa tgaggatgac tgtgtcgtca gattccagta 
2101 ctatgaagat tctagtggaa agtccatcct gtatgtggta gaagagccag agtgtcccaa 
2161 gggccctgac atcctggtgg tcctgctctc agtgatgggg gccattctgc tcattggcct 
2221 tgccgccctg ctcatctgga aactcctcat caccatccac gaccgaaaag aattcgctaa 
2281 atttgaggaa gaacgcgcca gagcaaaatg ggacacagcc aacaacccac tgtataaaga 
2341 ggccacgtct accttcacca atatcacgta ccggggcact taatgataag cagtcatcct 
2401 cagatcatta tcagcctgtg ccacgattgc aggagtccct gccatcatgt ttacagagga 
2461 cagtatttgt ggggagggat ttggggctca gagtggggta ggttgggaga atgtcagtat 
2521 gtggaagtgt gggtctgtgt gtgtgtatgt gggggtctgt gtgtttatgt gtgtgtgttg 
2581 tgtgtgggag tgtgtaattt aaaattgtga tgtgtcctga taagctgagc tccttagcct 
2641 ttgtcccaga atgcctcctg cagggattct tcctgcttag cttgagggtg actatggagc 
2701 tgagcaggtg ttcttcatta cctcagtgag aagccagctt tcctcatcag gccattgtcc 
2761 ctgaagagaa gggcagggct gaggcctctc attccagagg aagggacacc aagccttggc 
2821 tctaccctga gttcataaat ttatggttct caggcctgac tctcagcagc tatggtagga 
2881 actgctgggc ttggcagccc gggtcatctg tacctctgcc tcctttcccc tccctcaggc 
2941 cgaaggagga gtcagggaga gctgaactat tagagctgcc tgtgcctttt gccatcccct 
3001 caacccagct atggttctct cgcaagggaa gtccttgcaa gctaattctt tgacctgttg 
3061 ggagtgagga tgtctgggcc actcaggggt cattcatggc ctgggggatg taccagcatc 
3121 tcccagttca taatcacaac ccttcagatt tgccttattg gcagctctac tctggaggtt 
3181 tgtttagaag aagtgtgtca cccttaggcc agcaccatct ctttacctcc taattccaca 
3241 ccctcactgc tgtagacatt tgctatgagc tggggatgtc tctcatgacc aaatgctttt 
3301 cctcaaaggg agagagtgct attgtagagc cagaggtctg gccctatgct tccggcctcc 
3361 tgtccctcat ccatagcacc tccacatacc tggccctgag ccttggtgtg ctgtatccat 
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3421 ccatggggct gattgtattt accttctacc 

3481 atgagttggc tgggaataag tgccaggatg 

3541 ggcctgttct tctatgggtt ggacaacctc 

3601 acagtgcaat tttattttat ttttctcatg 

3661 atataaacat gcttgcatta tatttgtaaa 

3721 ggaaaccaca cagacttggg cagggtacag 

3781 tcactggcca gtggctggat ctgtgagggg 

3841 atgtgtggac acattggacc tttcctgagg 

3901 cagtggctcc attggtgttg acatacatcc 

3961 aaaaaaagaa agacttatca acatttgttc 



tcttggctgc cttgtgaagg aattattccc 
gaatgatggg tcagttgtat cagcacgtgt 
attttaactc agtctttaat ctgagaggcc 
atgaggtttt cttaacttaa aagaacatgt 
tttatgtgta tggcaaagaa ggagagcata 
acactcccac ttggcatcat tcacagcaag 
ctctctcatg atagaaggct atggggatag 
aagagggact gttcttttgt cccagaaaag 
aacattaaaa gccaccccca aatgcccaag 
catgagg 
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HUMATH3A3 238 bp DNA PRI 

Human antithrombin III (ATIII) gene, exon 6. 
M21645 
gl79149 

antithrombin; antithrombin III. 
3 of 3 

Homo sapiens ( individual_isolate Patient II-9) DNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 238) 

Bock,S.C, Marrinan, J. A. and Radziejewska, E. 

Antithrombin III Utah: proline-407 to leucine mutation in a highly 
conserved region near the inhibitor reactive site [published 
erratum appears in Biochemistry 1989 Apr I8;28<8) :3628] 
Biochemistry 27 (16), 6171-6178 (1988) 
89050967 

Draft entry and computer -readable sequence [1J kindly submitted by 
S.C.Bock, 20-JAN-1989. 

Location/Qualifiers 
1. .238 

/organism= "Homo sapiens" 
/isolate= M Patient II-9" 
/db_xref = " taxon: 9606 ■ 
/cell_type= "peripheral blood cell" 
/map="lq23-q25.1» 

join(M21643:l. . 398, M21644 : 1 . .469,1. .183) 
/gene="AT3 " 
<1. .6 

/gene="AT3" 

/note=" antithrombin III, intron F" 
<7. .183 
/gene="AT3" 
/note="exon 6" 
/codon_start=l 
/db_xref="GDB:G00-119-024" 
/product= B antithrombin III" 
/db_xref="PID:gl79152" 

/ trans! at i on= " VNEEGSEAAASTAWIAGRSLNPNRVTFKANRPFLVFIREVPLN 

T I IFMGRVANPCVK " 
BASE COUNT 63 a 50 c 53 g 72 t 

ORIGIN About 7.8 kb from segment 3B; chromosome lq23 . 

1 ctgcaggtaa atgaagaagg cagtgaagca gctgcaagta ccgctgttgc gattgctggc 
61 cgttcgctaa accccaacag ggtgactttc aaggccaaca ggcctttcct ggtttttata 
121 agagaagttc ctctgaacac tattatcttc atgggcagag tagccaaccc ctgtgttaag 
181 taaaatgttc ttattctttg cacctcttcc tatttttggt ttgtgaacag aagtaaaa 
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LOCUS HUMGP2B2 623 bp DNA pri 08-NOV-1994 

DEFINITION Human platelet glycoprotein lib mRNA, C-terminal exon 
ACCESSION M22569 
NID gl83449 

KEYWORDS platelet glycoprotein lib. 
SEGMENT 2 of 2 

SOURCE Homo sapiens (tissue library: lambda-EMBL 4) DNA 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 623) 

?randini,M.H. , Denarier,E., Frachet,P., Uzan,G. and Marguerie,G. 
Isolation of the human platelet glycoprotein lib gene and 
characterization of the 5' flanking region 
Biochem. Biophys. Res. Commun. 156 (1), 595-601 (1988) 
89025907 

Draft entry and computer -readable sequence [1] kindly submitted by 
M.H.Prandini, 16-FEB-1989. 

Location/Qualifiers 
1. .623 

/organism^ "Homo sapiens" 

/db_xref=-taxon:9606- 

/cel l_type= ■ leucocyte " 

/tissue_lib=" lambda-EMBL 4" 

/map=-17q21.32- 
g en e join (M22568 : 1254 . . 1869, 1 . .434) 

/gene=°ITGA2B" 
intron <l . .191 

/gene="ITGA2B" 
/note="G00-120-012" 
exon 192.. 434 

/partial 
/ gene= " ITGA2B B 

/note="last exon; GOO— 120 — 012 n 
CDS <192..251 

/ gene= " ITGA2B" 
/codon_start=l 
/db_xref="GDB:GOO-120-012" 
/product= -platelet glycoprotein lib" 
/db_xref = - PID : g463108 ■ 
/ translations " VGFFKRNRHTLEEDDEEGE ■ 
BASE COUNT 144 a 158 c 181 g 140 t 

ORIGIN About 15 kb after segment 1. 

1 aaaactcagg aagaaacaaa cccaccaatc gttccaggca tatctcaaat gcaaaaggca 
61 tccattgtga gtacagtggg ctttcatgtt ctgcgctggt ccagggaggt gctcatagct 
121 acttcctcac atgtgctctg gggccagcaa atcatctgta taccctgacc ttggcccccg 
181 tgtaccccca ggtcggcttc ttcaagcgga accggcacac cctggaagaa gatgatgaag 
241 agggggagtg atggtgcagc ctacactatt ctagcaggag ggttgggcgt gctacctgca 
301 ccgccccttc tccaacaagt tgcctccaag ctttgggttg gagctgttcc attgggtcct 
361 cttggtgtcg tttccctccc aacagagctg ggctaccccc cctcctgctg cctaataaag 
421 agactgagcc ctgatgctga gcatgctgcc tccttttggg gccagagaag agagtaccga 
481 agaatgtttt ggacggggac ctagggctgg tggaagtatg aacgagagag tcactgccag 
541 ggcgaagttt gcaaatcact gtctttgggg agtgtcaggg agtacagagt tggggtggta 
601 ggtgtaacag aagacggaga gcc 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 
SOURCE 
ORGANISM 



01-NOV-1994 
complete cds. 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
FEATURES 

source 



HUMCETP 1787 bp mRNA PRI 

Human cholesteryl ester transfer protein mRNA, 
M30185 
gl80259 

cholesteryl ester transfer protein; transfer protein. 
Human adult liver, cDNA to mRNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1787) 

Drayna,D., Jarnagin, A. S . , McLean, J. , Henzel,W., Kohr,W., 
Fielding, C. and Lawn,R. 

Cloning and sequencing of human cholesteryl ester transfer orotein 
cDNA 

Nature 327 (6123), 632-634 (1987) 
87258172 

Location/Qualifiers 
1. .1787 

/ organi sm= ■ Homo sapiens" 
/ db_xr e f = - t axon : 9 6 0 6 n 
/dev_stage= • adult " 
/ tissue_type= ■ liver ■ 
*<1. .1787 
/note=°CETP mRNA" 
131. .181 
/gene="CETP" 

/note=" cholesteryl ester transfer protein signal peptide" 
131.. 1612 
/gene="CETP" 
131. .1612 
/gene=°CETP n 

/note= "cholesteryl ester transfer protein precursor" 

/codon_start=l 

/db_xref="GDB:G00-119-773" 

/db_xref = " PID : gl802 60 ■ 

/ translations " MLAATVXTLALLGNAHACSKGTSHEAGIVCRITKPALLVliNHET 

AKVIQTAFQRASYPDITGEKAMMLLGQVTttGL^^ 

IQNVSVWKGTLKYGYTTAWWLGIDQSIDFEIDSAIDI^INTQLTCDSGRTOTDAPDC 
YLSFHKLLLHLrQGEREPGWI KQLFTNF I SFTLKLVLKGQ I CKE INVT SNIMADFVQTR 
AASILSDGDIGVDISLTGDPVITASYLESHHKGHFIYKNVSEDLPLPTFSPTLLGDSR 
MLYFWFSERVraSU^AFQDGRLm,SLMGDEFKAVI,ETWG 

QAQ VTVHCLKMPKI SCQNKGVWNSSVMVKFLF PRPDQQHSVAYTFEEDI VTTVQASY 

SKKKLFLSLLDFQITPKTOSNLTESSSESIQSFLQSMITAVGIPEVMSRLEVVFTALM 

NSKGVSLFDI INPEI ITRDGFLLLQMDFGF PEHLLVDFLQSLS ■ 



mRNA 

sig_peptide 
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mat_peptide 182.. 1609 

/gene="CETP° 

transfer protein" 

BASE COUNT 397 a 531 c 456 g 

ORIGIN 

1 gtgaatctct ggggccagga agaccctgct 
61 tgggcggaca tacatatacg ggctccaggc 
121 cctgataacc atgctggctg ccacagtcct 
181 ctgctccaaa ggcacctcgc acgaggcagg 
241 cctggtgttg aaccacgaga ctgccaaggt 
301 cccagatatc acgggcgaga aggccatgat 
361 caacatccag atcagccact tgtccatcgc 
421 gtccattgat gtctccattc agaacgtgtc 
481 ctacaccact gcctggtggc tgggtattga 
541 cattgacctc cagatcaaca cacagctgac 
601 ccctgactgc tacctgtctt . tccataagct 
661 tgggtggatc aagcagctgt tcacaaattt 
721 gggacagatc tgcaaagaga tcaacgtcat 
781 aagggctgcc agcatccttt cagatggaga 
841 tcccgtcatc acagcctcct acctggagtc 
901 tgtctcagag gacctccccc tccccacctt 
961 gctgtacttc tggttctctg agcgagtctt 
1021 tggccgcctc atgctcagcc tgatgggaga 
1081 cttcaacacc aaccaggaaa tcttccaaga 
1141 agtcaccgtc cactgcctca agatgcccaa 
1201 caattcttca gtgatggtga aattcctctt 
1261 ttacacattt gaagaggata tcgtgactac 
1321 cttcttaagc ctcttggatt tccagattac 
1381 cagctccgag tccatccaga gcttcctgca 
1441 ggtcatgtct cggctcgagg tagtgtttac 
1501 cttcgacatc atcaaccctg agattatcac 
1561 ctttggcttc cctgagcacc tgctggtgga 
1621 aaggaggtcg ggatggggct tgtagcagaa 
1681 ggtgtctcct ccagcgtggt ggaagttggg 
1741 aactcctccc tatcctaaag gcccactggc 



/note="cholesteryl ester 

403 t 

gcccggaaga gcctcatgtt ccgtgggggc 
tgaacggctc gggccactta cacaccactg 
gaccctggcc ctgctgggca atgcccatgc 
catcgtgtgc cgcatcacca agcctgccct 
gatccagacc gccttccagc gagccagcta 
gctccttggc caagtcaagt atgggttgca 
cagcagccag gtggagctgg tggaagccaa 
tgtggtcttc aaggggaccc tgaagtatgg 
tcagtccatt gacttcgaga tcgactctgc 
ctgtgactct ggtagagtgc ggaccgatgc 
gctcctgcat ctccaagggg agcgagagcc 
catctccttc accctgaagc tggtcctgaa 
ctctaacatc atggccgatt ttgtccagac 
cattggggtg gacatttccc tgacaggtga 
ccatcacaag ggtcatttca tctacaagaa 
ctcgcccaca ctgctggggg actcccgcat 
ccactcgctg gccaaggtag ctttccagga 
cgagttcaag gcagtgctgg agacctgggg 
ggttgtcggc ggcttcccca gccaggccca 
gatctcctgc caaaacaagg gagtcgtggt 
tccacgccca gaccagcaac attctgtagc 
cgtccaggcc tcctattcta agaaaaagct 
accaaagact gtttccaact tgactgagag 
gtcaatgatc accgctgtgg gcatccctga 
agccctcatg aacagcaaag gcgtgagcct 
tcgagatggc ttcctgctgc tgcagatgga 
tttcctccag agcttgagct agaagtctcc 
ggcaagcacc aggctcacag ctggaaccct 
ttaggagtac ggagatggag attggctccc 
attaaagtgc tgtatcc 
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LOCUS HUMGPIIB2 13204 bp DNA PRI 10-NOV-1994 

DEFINITION Human platelet Glycoprotein lib (GPIIb) gene, exons 2-29. 
ACCESSION M33320 
NID gl83506 

KEYWORDS platelet Glycoprotein lib. 
SEGMENT 2 of 3 

SOURCE Human leukocyte DNA. 

ORGANISM Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 13204) 

Heidenreich,R. , Eisman,R., Surrey, S., Delgrosso, K. , Bennett, J. S. 
Schwartz, E. and Poncz,M. 

Organization of the gene for platelet glycoprotein lib 
Biochemistry 29 (5), 1232-1244 (1990) 
90212612 

Location/Qualifiers 
1.. 13204 

/organism= "Homo sapiens" 
/db_xref s"taxon:9606 ,, 
/map="17q21.32 tt 
prim_transcript <1..>13204 

/notes "GPIIb mRNA and introns" 
<1. .497 

/note= "GPIIb intron A" 
498. .619 
/ genes " ITGA2B n 
/number= 2 
620. .708 

/note= "GPIIb intron B" 
709. .806 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number= 3 
807.. 911 

/note= "GPIIb intron C 
912. .1077 
/gene="ITGA2B" 

/note="platelet Glycoprotein lib" 
/numbers 4 
1078. .1292 

/note= B GPIIb intron D" 
1293. .1342 
/gene= "ITGA2B" 

/note= B platelet Glycoprotein lib" 
/numbers 5 
1343. .1418 

/note= B GPIIb intron E {no splice consensus); putative; 
does not fit consensus" 
1419.. 1464 
/genes -ITGA2B" 

/note= "platelet Glycoprotein lib" 
/numbers 6 
1465. .1551 

/note="GPIIb intron F" 
1552. .1680 
/genes "ITGA2B" 

/note= "platelet Glycoprotein lib" 
/numbers 7 
1681. .2041 

/notes-GPIIb intron G" 
2042. .2089 
/genes "ITGA2B" 

/note= "platelet Glycoprotein lib" 



exon 



intron 



exon 



intron 



exon 



intron 



intron 



exon 



intron 



exon 



intron 



exon 
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nitron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 



/number= 8 
2090. .2244 

/note="GPIIb intron H (no splice consensus); putative- 
does not fit consensus" 
2245. .2288 
/gene= n ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =9 
2289. .2460 

/note="GPIIb intron I" 
2461.. 2514 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 

/number=10 

2515. .2652 

/note="GPIIb intron J" 
2653. .2705. 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =11 
2706. .2896 

/note="GPIIb intron K" 
2897. .3108 
/gene="ITGA2B" 

/note="platelet Glycoprotein lib" 
/number =12 
3109. .5535 

/note="GPIIb intron L" 
5536. .5718 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =13 
5719. .5951 

/note="GPIIb intron M" 
5952. .5997 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =14 
5998. .6105 

/note="GPIIb intron N" 
6106. .6210 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" • 
/number =15 
6211. .6294 

/note="GPIIb intron 0" 
6295.. 6350 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =16 
6351. .6442 

/note="GPIIb intron P" 
6443. .6594 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 

/number=17 

6595. .6782 

/note="GPIIb intron Q" 
6783.. 6908 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =18 
6909. .7885 

/note="GPIIb intron R" 
7886. .7953 
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mtron 
exon 

intron 
exon 

Glycoprotein lib" 
intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 
exon 

intron 



/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =19 
7954. .8086 

/note=°GPIIb intron S" 
8087. .8234 
/gene="ITGA2B" 

/note="platelet Glycoprotein lib" 
/number =20 
8235. .8802 

/note="GPIIb intron T" 
8803.. 8895 
/gene="ITGA2B" 

/number =21 
8896. .9505 

/note="GPIIb intron U" 
9506. .9585 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =2 2 
9586. .10201 

/note=°GPIIb intron V" 
10202. .10282 
/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 

/number =2 3 

10283. .10405 

/note="GPIIb intron W" 

10406. .10505 

/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 

/number=24 

10506. .10604 

/note="GPIIb intron X" 

10605.. 10757 

/gene="ITGA2B" 

/note=" platelet Glycoprotein lib" 

/number=25 

10758. .10873 

/note="GPIIb intron Y" 

10874. .10999 

/gene="ITGA2B" 

/note= "platelet Glycoprotein lib* 

/number =2 6 

11000.. 11477 

/note="GPIIb intron Z" 

11478. .11591 

/gene="ITGA2B" 

/note= "platelet Glycoprotein lib" 

/number =27 

11592. .11827 

/note="GPIIb intron AA" 

11828. .11929 

/gene= a ITGA2B° 

/note="platelet Glycoprotein lib" 

/number =2 8 

11930. .12116 

/note="GPIIb intron BB" 

12117. .12233 

/gene=" ITGA2B" 

/note= "platelet Glycoprotein lib" 
/number =2 9 
12234. .>13204 
/note="GPIIb intron CC" 



/note= "platelet 
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BASE COUNT 3046 a 3579 c 

ORIGIN About 2000 bp after 

1 ctgcaggtca acggatctgc 
61 agaggtaccc gctaccttcc 
121 ctaggcaggc attccaggga 
181 ggtgttgtac agagtttagg 
241 tcaaaccaaa ggggattata 
301 gagattgccc tcgctgagag 
361 ggtctgtgag gtgtcattga 
421 gcagtgctcc cagcgccggc 
481 ccagctttcc tatgcagagt 
541 caggaggaga cgggcggcgt 
601 tcgctgctct ttgacctccg 
661 cgtggactgc ccgggcttca 
721 gaaatgtagg ctcccaaact 
781 tcgtcagctg gagcgacgtc 
841 gggggcaggg acactggggc 
901 ccctttctca ggcctgcgcc 
961 ctgagaagac gcccgtaggt 
1021 agtactcccc ctgtcgcggg 
1081 agcgccagct acgacctggc 
1141 tcccgccccc agcgccgcag 
1201 ctcaaggccc cgcccctgtc 
1261 cctgggctga cccctcctcc 
1321 cttcagctcc gtggtcactc 
1381 aacagggccc cctctcaccc 
1441 ctcctggcgg ctattatttc 
1501 ggccgaagga gaccgctttg 
1561 cccaggctcc agttgcggat 
1621 tgtcctccca gagcctctcc 
1681 gtaacaccgc cattccagac 
1741 ggtcctgccc ctgtgggagc 
1801 gctcccgccc tccgctcctg 
1861 tcccttccac tgcggactcg 
1921 cgtttttcca tctgcacaat 
1981 gccctccgtc ccctctgtgc 
2041 gggtactcgg tggccgtggg 
2101 acttagggcg ggagttgggt 
2161 atgtagctgg gtgcagaacg 
2221 gagcctggct ctccctatcg 
2281 tgggagcggt aagtgccccc 
2341 ctgacaactc ctgagcgccc 
2401 ctggagtggg aggttgcttt 
2461 gtggaaattt tggattccta 
2521 gccaggtccc agtgggcgtg 
2581 ggaggtgagg gcccatttct 
2641 ctcatcttgc agatggcgtc 
2701 gatgggtgag gagggacatg 
2761 gcccctctgt ctccctttcc 
2821 aagggtcgag gagatttggc 
2881 ctcatctggc ccacaggagg 
2941 gggcagaccg aaaactggcc 
3001 cccacgcgct gggtgccccc 
3061 gctctgccat cgcacccctg 
3121 aggagcccta cttgctgcag 
3181 gggcagccag aaccaggatg 
3241 gctgagtgga gagcagatgg 
3301 agcaagagac aatgaccacc 
3361 cttcacagat atttaggact 
3421 ggggagaggt tggagttggg 
3481 agcaggtgct ggggagaggc 
3541 gggcttgggt gctttaggcg 
3601 ccacaagaga gatctgaatg 
3661 ctgtgaaata agaggcccag 
3721 caggaggtaa gtctgagaag 



3857 g 2722 t 
segment 1. 

tagggtcctc ctatcagcac acacactcca gccccacttt 
ctcattaaaa ccagctctca agaggggatc tggtaacagt 
gcatgtgaac cgctggttct tgttgcgggt ggaggatgga 
tctttttcag caaagatctc caaaccccgg gtgttcaaaa 
gtcccagctc tactcacaac tcactggtta ctttagccac 
tcggtttcac tgtccataag atgaagaagc acatcacggt 
ggaaagatgg tccagtgccc ccatgccaca tggccttcgg 
gccagggcct gggatacgct ggaatctgcg cggcgctcac 
ggccatcgtg gtgggcgccc cgcggaccct gggccccagc 
gttcctgtgc ccctggaggg ccgagggcgg ccagtgcccc 
tgagtcccag gcaaggagag caaggttggg gtcagaggga 
gcgccccacc ccttcttgtg ccttccaggt gatgagaccc 
ttacaaacct tcaaggcccg ccaaggactg ggggcgtcgg 
attgtggtgg gccccgcggt acagggcaca gggaacaatc 
caggaggagc ccaagtctcg cgccccgtcc ccatctgtgg 
ccctggcagc actggaacgt cctagaaaag actgaggagg 
agctgctttt tggctcagcc agagagcggc cgccgcgccg 
aacaccctga gccgcattta cgtggaaaat gattttagta 
cccgcccact cgcgacggct tggccccgcc ccccatcgga 
cccttgcttt ggatctggcc tcgccccagg gccccgccga 
ccccagccct cctccgggct cgcgcgcgcc tcccttcacc 
ttgtctcctc aggctgggac aagcgttact gtgaagcggg 
aggcgagtag ggagcaaaag cgcagtgggg gcggctccca 
tcaggacttc ccttccaggc cggagagctg gtgcttgggg 
ttaggtacgt gcccatccgt acacctcccc cccttctcgc 
ggcttcacac ccgctgtccc tcccgcccta ggtctcctgg 
attttctcga gttaccgccc aggcatcctt ttgtggcacg 
tttgactcca gcaacccaga gtacttcgac ggctactggg 
ttccagcacc ccgagggtca ccgcccaccg cagacggtca 
ctccatggcc acccctgccg gccaacccac cgcctaagcc 
cgcttccccg cagaccgccc acctcccatg cgcccaccgc 
tagcgcagcc tggggcaggg cttggcccct cgaaggcctc 
gcagggctgg ggctgagtgg ccttaatctc ctccttcttt 
ttcctcccct ggaaaagact aatttgcgcc cttgtcctca 
cgagttcgac ggggatctca acactacagg caagaaatcc 
agcccagccc ggggaggagc gccttcctga aatctcccct 
gggagcggga agtgggtagg ttctaaggct ctcattccct 
ccagaatatg tcgtcgtgcc ccccacttgg agctggaccc 
accactgggc ctcccgaagc cccttatccc agttctcagg 
cccacccccg ccccgcctcc accaaaccac cctttctcac 
gggtacaaga atgatgctct cgcctgcgct gtccgtgcag 
ctaccagagg ctgcatcggc tgcgcggaga gcaggtgggg 
gctgggtgga gggggaactg agacttcaga atatttcatg 
taaagaggat gcttgtccag cggcgtgaat gatggtgctc 
gtattttggg cattcagtgg ctgtcactga cgtcaacggg 
cccccacccc tacccagttg ggtcccaaat taccagagct 
tagccctagt ctcacgtatc cactggagga acaggagagc 
cctagcccca atatacccct ggtccagtcc catgtaacca 
catgatctgc tggtgggcgc tccactgtat atggagagcc 
gaagtggggc gtgtgtattt gttcctgcag ccgcgaggcc 
agcctcctgc tgactggcac acagctctat gggcgattcg 
ggcgacctcg accgggatgg ctacaatggt gagggaagag 
aggggttaac agccactcaa aaagcatgga gttggcctga 
ggttttaagc atataagtat gtggcttaga cacatggggt 
gagagttgaa gactaattag gaagtgtttg ccttaatcca 
tggatgtgga ttttggcagt ggagttagag atgggagtga 
cggattatta ggacttggtg ggagactgga tgtggggcca 
tgcctgtgat ggcctccact gcctggaact caggccgtgc 
gggagatcag cagttcagct ctggacctgt tgagcttgaa 
gaaataccca aagaacagtt gggagtggct ctccccgctt 
ggagacaggg gtttggggaa agtggatgag gtcccgggac 
gatagagccc tagggagcaa aagcatttag gtgactccta 
gagacagagg agtgtccaga gagggaggag ggaacccagg 
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3781 gggtctgatg gcccgggact caaggaagag catgcgttaa agagcatgca caggaggaag 
3841 tgggcgctgc agctcctgct gctgctgcaa gatacaatta ggtggggctg gagaaatatt 
3901 catgggcttt agcaagaaga gggtgccagg catggtggct catacctgta atcccagcta 
3961 cttgggaaat tgaagcagga gaatctcttg aacccgggaa gtggaggttg cactgagctg 
4021 agcttgcgcc actactgcac tccagcctgg gtgacagagc aagactccat ctcaacaaaa 
4081 taaaaaaaaa aatagagaaa gaaaggaaga aagaaaaaag aaggggaggt tattggtgac 
4141 agtgacataa attgattcag gccaagatag ggtcagaagc cagaatgcaa tggggtaagg 
4201 tatgaatgga gatgaaaaat tggatgcagc taatgtagac agctctttca acaggtttgt 
4261 ggtaaaaagg aatttgagga atagaaagga aaaaaaaaaa catgtttgac tataagagga 
4321 aaaagagaaa aggtgatcac agaaaagaga tgagggtcaa gggaagatta tttcaatgtg 
4381 gaagaacatg tagtaggttg aaaatgatgt tgtggggaaa tggggggatg agccagcaga 
4441 gagtccctgt gatgcctcag ggggtgggag ggtgactggc ccagtgtcag ggtgaaggaa 
4501 ggaaacctct tccagggtca aatggggaaa gggaaaaaga aagttggtgt gggattatag 
4561 cataacagtg ggctgcctct cttcctgaag taagagatta cgtcacctgc tgaaggaagt 
4621 gtggggggtc tgggagtttg atggaatgga gaaggctaga aatagatgct agatggccag 
4681 gcacggtggc tcacacctgg aatcccagca ctttgggagg ccgaggcagg aggatcactg 
4741 gagcctagga gtttgacacc agcctggcca acatagggag atctcgtctc cataaaaatt 
4801 tttaaaaatt agctgggcat ggtggctata gtctcaactg cttgggaagc tgaggtggga 
4861 ggattgcttt agtccagaag gttgaggctg cagtaagcca tggttgcacc actgcacttc 
4921 agcctgaatg acaagtgcaa gactgtctta aaataaaaaa tttaaagggc ttgggcacgg 
4981 tggctcacac ctgtaatcca gcactttggg agcccaaggt gggcagatca cttgaggtca 
5041 ggagttcgag atcagcctgg ccaatgtggt gaaaccccgt ctctactgaa aatacaaaaa 
5101 ttagccgggc atggtggtag gcgcctgtaa tcccagctac tgaagaggct gaggcacaag 
5161 aatcacttta acgggggagg cagaggctgc agtgagccga gatcgcacca ctgcactcca 
5221 gccaggacaa cagagcgaga ctccatctca aaaaaaaaaa aatttagaaa agggaataat 
5281 gatgcttaat tttcaggata tattttcctc aatagacagt gagagttgtc actgttttta 
5341 taacaatcct acttggcagg tccctctccc acctgattgt taactcctgg agggtagggc 
5401 agtgcctcct tcacccacac tttgcacccc tttcctagtc ccctgggatg ttcccagaga 
5461 agctcaggaa agttttacag tcatctaggg aggctgaata acaatcagcc acttcctttc 
5521 tgttactcct tccagacatt gcagtggctg ccccctacgg gggtcccagt ggccggggcc 
5581 aagtgctggt gttcctgggt cagagtgagg ggctgaggtc acgtccctcc caggtcctgg 
5641 acagcccctt ccccacaggc tctgcctttg gcttctccct tcgaggtgcc gtagacatcg 
5701 atgacaacgg atacccaggt gccctggact gcctccagct agaaatgccc aagaaaggcc 
5761 cttggacatt cgctggaagt gccaagagac acggccaggg ctcatgcctg gcctggtgtc 
5821 ccactatgga ctgccagagg ggctgggtga aacctccagt gggggaggtg gtgtggggaa 
5881 cccctgggaa gatgagatga ggatccccat accctaatcg ccaattctga cccattcctc 
5941 gatgtctata gacctgatcg tgggagctta cggggccaac caggtggctg tgtacaggtg 
6001 agcactggct ccaggggcgg gatggggaag gtcctgtgcc atcaagagga ggccaggcca 
6061 ggaggagcca caatggcaag cctccccatc accctatccc atcagagctc agccagtggt 
6121 gaaggcctct gtccagctac tggtgcaaga ttcactgaat cctgctgtga agagctgtgt 
6181 cctacctcag accaagacac ccgtgagctg gtgaggaggc agagggcatg ggccttaaag 
6241 gatctgggac ctcagaaagg ctccaacccc tgagccccac ttacgtcttt gcagcttcaa 
6301 catccagatg tgtgttggag ccactgggca caacattcct cagaagctat gtgagtggca 
caI 1 tgaa 9999gc aggagggagg tgggcttgga ctcccccgga ggctggccag ggaggtcctg 
6421 actcttctgc ttgccctgcc agccctaaat gccgagctgc agctggaccg gcagaagccc 
6481 cgccagggcc ggcgggtgct gctgctgggc tctcaacagg caggcaccac cctgaacctg 
6541 gatctgggcg gaaagcacag ccccatctgc cacaccacca tggccttcct tcgagtacgc 
6601 ccaggcaggg gattggcagg gctgggagag tagaacttac ccactggact tgttcatcta 
6661 gccctggggc actgagctgg gtgctgtgag tccgggggtg gtcaggacac aggtgcctac 
6721 tggccaggag aaggtgggat gtgtatggta gcaagatggc ctgactcttg cccctgtcct 
6781 aggatgaggc agacttccgg gacaagctga gccccattgt gctcagcctc aatgtgtccc 
tli 1 tacc 9 cccac ggaggctgga atggcccctg ctgtcgtgct gcatggagac acccatgtgc 
6901 aggagcaggt agggacaggc agggacaggc cagggaggtg caggacccct gatagcaaat 
6961 caggattagg gttagtgcca agtcacaatg taaccccaaa accttgatgt cattccaaac 
7021 cctaatgaaa acctcaaaat ccagccagtc atggtggctc acacctgtaa tcccagcact 
7081 ttgggagacc gaggcaggca gattgcctga ggtcaggagt tagagaccaa cctggccaac 
7141 atggtgaaaa cccatctcta ctaaaaatac aaaaaaaatt agccgggtgt ggtgacgcat 
7201 gcctgtaatt ccagctactc gggaggctga agcaggagaa tcacttgaac ccaggaggca 
7261 gaggttgcag tgagccaaga gtgtgccaca gcactccagc ctgggtgaca gagcaagact 
7321 ctgtctcaaa aaaaaaaaaa aaagccaggc gcagtggcct cacgcctgta atcccagcac 
7381 tttgggaggc caaggcgggt ggatcacgag gtcaggagat caagaccatc ctggctaaca 
7441 cagtgaaacc ccgtctacta aaaatacaaa aaaaaaaaaa aaattagctg ggcgtggtgg 
7501 cgggtacctg tagtcccagc tacttgggag gctgaggcag gagaatggcg tgaaccccgg 
7561 gggcggacgt tgcagtgagc cgagatagtg ccactgcacc ccagcctgga cgacagagcg 
7621 agactccgtc tccaaaaata aaaaaacacc tgaaaatccc agtatcccct aagctctgat 
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LOCUS 

DEFINITION 

ACCESSION 

NID 

KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
cofactor 

JOURNAL 
MEDLINE 
FEATURES 

source 



exon 



HUMHCF2 15849 bp DMA PRI 08-NOV-1994 

Human heparin cofactor II (HCF2 ) gene, exons 1 through 5. 
M58600 J05309 
gl83907 

heparin cofactor II; serpin. 
Human DNA . 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

I (bases 1 to 15849) 

Herzog,R., Lutz,S., Blin,N. , Marasa^J.C, Blinder, M. A. and 
Tollefsen, D.M. 

Complete nucleotide sequence of the gene for human heparin 

II and mapping to chromosomal band 22qll 
Biochemistry 30 (5), 1350-1357 (1991) 
91120782 

Location/Qualifiers 
1. .15849 

/organism="Homo sapiens" 
/ db_xref = • t axon : 9 6 0 6 ■ 
/map="22qll.2" 
1750. .1796 
/gene="HCF2" 
/note="G00-120-038" 
/ number =1 

/product =" heparin cofactor II" 

join(1750. .1796,6948. . 7852 , 11623 . . 11896, 13654 . . 13798, 
14527. .15372) 
/gene="HCF2" 

join (1750.. 1796, 6948. . 7852, 11623 .. 11896, 13654 . . 13798, 
14527. .15372) 
/gene="HCF2" 
/note="G00-120-038" 
/product=° heparin cofactor II" 
6948. .7852 
/gene="HCF2" 
/note="G00-120-038" 
/number =2 

/product=" heparin cofactor II" 

join(6964. .7852,11623. .11896,13654. .13798,14527. .14718) 
/gene="HCF2" 
/codon_start=l 
/db_xref="GDB:G0O-120-O38" 
/product^ "heparin cofactor II" 
/db_xref = 0 PID: gl83 908 ■ 

/ trans la t i on= " MKHSLNAIjLI FLI ITSAWGGSKGPLDQLEKGGETAQSADPQWEQ 

LNNKNLSMPLLPADFHKENTVTNDWI PEGEEDDDYLDLEKIFSEDDDYIDIVDSLSVS 

PTDSDVSAGNILQLFHGKSRIQRLNILNAKFAFNLYRVLKDQVNTFDNIFIAPVG 

AMGMISLGLKGETHEQVHSILHFKDFVNASSKYEITTIHNLFRKLTHRLFRROT 

RSVNDLYIQKQFPILLDFKTKVREYYFAEAQIADFSDPAFISKTNNHIMKLTKGLIKD 

ALENIDPATQMMI LNCI YFKGSWVNKFPVEMTHNHNFRLNEREVVKVSMMQTKGNFI^ 

antk3eux:dii^lewggismliwphkmsgmktleaqltprvver 

LLPKFKLEKNYNLVESLKLMGIRMLFDKNGNMAGISDQRIAIDLFKHQGTITVNEEGT 

QATTVTTVGFMPLSTQVRFTVDRPFLFLIYEHRTSCLLFMGRVANPSRS ■ 
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BASE COUNT 
ORIGIN 



11623. .11896 
/gene="HCF2 n 

/note=-G00-120-038" 
/number =3 

/product =" heparin cof actor II ■ 

13654.. 13798 

/gene="HCF2' 

/note^GOO-120-038" 

/number =4 

/product= "heparin cof actor II- 

14527. .15372 

/gene= B HCF2" 

/note="G00-120-038" 

/number=5 

/Product^ "heparin cofactor II" 
4477 a 3814 c 3642 g 3916 t 



1 
61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
841 
901 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1981 
2041 
2101 
2161 
2221 
2281 
2341 
2401 
2461 
2521 
2581 
2641 
2701 
2761 
2821 



gggctttgca 
gggcacagac 
agatgaaggc 
acaaccaccg 
gagcaatgca 
cattctctct 
tctctgtgat 
aaaatttaaa 
ctgtaatccc 
gaccagccta 
catggtggtg 
cagctcggga 
aacagagtga 
aatcccagca 
agcctggcca 
ggtggtgtgc 
tgggagacag 
cgcgaaactc 
atcacagcct 
cagtaatagt 
aaacacagca 
actaattttg 
acacccgcct 
ctgtgatcat 
ctaccctcaa 
tcaagagggg 
tactcaaagt 
caaaggcaca 
acttatttac 
cactgatgac 
gttgggtttc 
tggttcattt 
aaatggtttc 
gttgacaaac 
aatcgaattg 
tccctgaagt 
cccaacactt 
gaccaacatg 
acgcgcctgt 
ggcagaggtt 
gactccgtca 
ataggagacc 
acacacacgt 
acctggtaaa 
ttttggaagt 
tctcccattc 
atcaagtgtt 
gaaagcacaa 



tgtgtgagaa 
agcagcctct 
tctaagaaga 
gtctgtgtcc 
cagacacccc 
tcagatagac 
aagctgatct 
acacaaatta 
agcactttgc 
ggcaacatag 
tgcacctgta 
ggttgaggct 
gaccctgtct 
ctttgggagg 
acatagtgaa 
gcctgtaatc 
aggttgcagt 
tgtctcaaaa 
cagagatccc 
cctatctgtc 
acccttgact 
tcctctctcc 
agtagctgag 
cccagaagag 
tgcagcctgg 
ctgctcctgc 
gtcagctcta 
gctgaggggg 
tttggaaaat 
cttggctgct 
taatgtttct 
ttctagcaaa 
atttttcagt 
ttagaatagg 
gataactgtc 
attagtatta 
tgggaggccg 
gtgaagccaa 
aatcccagct 
gcagtgagcg 
aaaaaaaaaa 
tactctcaaa 
acgtacacac 
tctcggtacg 
ttgaacttac 
tgcctgctct 
atgctctgat 
tatgaagttc 



caagacagag 
gcctgtggtg 
cagctctgac 
tgaacacaat 
catgggcccc 
tctgggtgcc 
tccagacaat 
aaaaacaaat 
aaggctgaag 
tgagaccctg 
ttcccagcta 
gcagtgagcc 
caaaaaacac 
ccgagacggg 
accccgtctc 
tcagccactc 
gagctgagat 
caaacaaaca 
cacgaatgcc 
ccacaacaga 
gaagaaaggt 
tccacctttc 
ccagccacat 
aggacacagt 
tccccagagg 
accaaggcta 
agaactggag 
tttgtgctga 
atgcagcaac 
ttcatctctg 
gctgattata 
ctaagaattc 
gtgcctatta 
agctgtggaa 
ctgtgattat 
aaggttagag 
aggcgggtgg 
gtctctacta 
actcaggagg 
gagatcgtgc 
aaaaaagaag 
tggtctagaa 
acacacagat 
ggtatacagg 
ttcaaaataa 
gttgggcctg 
gcgtgactga 
ccaggaaaaa 



aatgagggag 
ccacgctgaa 
aaaagctaga 
ggacctttac 
ttgcacaccc 
gacactccca 
ccagaatatt 
tatcataagg 
caggaggatc 
tctctacaaa 
cttgcagggc 
aagatcacgc 
atagggccag 
aggatcactt 
tactaaaaat 
aggaggctga 
cgcaccactg 
aacaaacaaa 
taagtggccc 
caggagtgct 
ccatgccaca 
actgaggaac 
cagtcctgga 
tggaggcaga 
cctgaagagc 
tgtgtgcatg 
atgaggagct 
ccaagctggt 
agcccagcac 
aagcgccact 
aattattttt 
agaagctttc 
taaaattgtg 
tagatgaaaa 
gtatgagaat 
gggccgggtg 
atcacgaggt 
aaaatacaaa 
ctgaggcagg 
cactgcactc 
aaaaaagaaa 
gaaaaaatgt 
aatgacaggg 
agttgttcta 
aaagttttcc 
gagaccatac 
aaaggccaac 
aaaaaagcaa 



gtgggcccca 
gactcagtat 
gtgcaaaatc 
actctggaat 
gcagattctc 
aacatgctct 
cttaaaactt 
ccgggcacag 
acttgagccc 
aaagtcaaaa 
tgaggtgagg 
cactgcactc 
gcgtggtggc 
cactccagga 
acaaaaaatt 
ggcaggagaa 
cactccagca 
cacccataaa 
tgaatttggg 
gggctgcacc 
atccccttat 
gagctcttgg 
gagcaggtgg 
tgcatggtct 
gccttgttta 
ctaacacagt 
gcaagccact 
tgcctggtgt 
caaagttcac 
tctcagaaac 
ggtgtttacg 
tacactgttt 
tcagttccat 
tattgtactt 
atccttgctc 
cagtggctca 
caggagttca 
aattagctgg 
agaatcgctt 
cagcctggac 
aaatgttaga 
gtatgtgcat 
caaaggttcc 
ctacactatt 
aaactttagg 
accaggaggg 
ccagctctgg 
aacaaacttt 



cgaggagtgt 
tgtatgtgac 
agactcagac 
ttctcaaacg 
ctaggagtca 
tgaggagcag 
tttagatcat 
tgactcatgc 
aagagttcaa 
gttagctaga 
aggattgctt 
cagcctgggt 
tcacgcatgt 
gttcaacacc 
agttggacat 
cgcttgaact 
tgggcagcag 
cacaaaatgt 
aggcactgct 
tactggcaac 
tctgtaagcc 
aaggacaggg 
agggcagatg 
ctactttcag 
tgtggtgacc 
aaccgtcata 
ctacagttat 
ttggattggg 
atcaaaatcc 
acagaggtaa 
gataggcaac 
tagaagtggg 
tgttgggaga 
atattaaatt 
ttgggtattt 
cgcctgtaat 
agaccagcct 
gcgtggtggc 
aaacccggga 
aacagagtta 
ggaacaagat 
gcctgtgaga 
aaaattttaa 
ctttcaacat 
cagttacttc 
atgacggttt 
caattagcaa 
tgaatgattt 
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2?4 8 1 tagtttttaa acatc^^ ct <^caaac agtaatctgg atttaatcac aacctagtga 

3001 taccttcaga tc^aattt rf!?"?' tatactaaat agcaaaacat caggaaJaEt 
ta cccccaga tctttaattt caatccataa aagatatcag agatattttc tcctfrrt^^ 

SIE ™ -~ ~ ssssss 
i iiE ~ *™ ssss sees 

3361 L,,^" ^5 tttttaa aa tagagatg aggtcttgct atgttgctcc agctggtctt 

1491 SStfi g f tcaa 9«=ga tcctcctgcc ttggcctccc aagatgctaa gattacaggc 

llll ^ 9 C atgcctggtc ttcttcttct tgatcttagc calaaggcca IgaagtglX 

\tll aga ? gaggac ac "9 aa 9tg tagttgggca aggagccttc taccagctgc ^acf^ctt 

Hi} 5 gttcctgac ttttaaaagt gtgttgctat tgatacacag tctcctqa?a tataaaatae 

3601 tgggaggatg aagctaagtt actcaaagtg clattcagaf actgggccca Stcta^a 

"51 ccfqcacttt aa^ 3933 " CatttCtaga ^ctgagLt gStllctca? acctgtaat? 
3781 99gaggccaa ggcaggagaa ttgcctgagc tcaggagttt gagacctgtc 

38U oclcacacct I^ 33 " 0 catctttacc aaaaacacaa aaaa?taact gggtttggtg 
J841 gcacacacct gtggtcccag ctacttcaaa aggctgaggt gggaggqtct cttaaaclta 

39° ESSE* tgccactgca I? ~ IgI " g KS52 

4021 atotataaac tS«« a taaaaataa aaagaaatcg tttctagaaa ctgttttccc 
4081 cafclaacar 9cagcctgag gcaggtgctg agatggggac ctggaaaagg 

4141 agtctffctt act? 3 ?- 3 ? aaacaatgtg a ctttcctgc tccaaaaigt gcalttcall 

S! ~ SSSS SSSS S™ E32S 
2™ ~ SSSSS SESR sss 

iffi jssss :jsss ess: 

4561 atftgcttaa clf™«™ ccgaacagtt accccatctt caggcctact gagttcaalt 
4621 gggcctcaat SSSfSS ? agtaact " tacctggcct caactggcag cagatattct 
4681 SSStttttt tttotf^^ " aggaaatg Stcacagaca caaaataagc ttaacaaaag 
4741 fcclqqttqq aa L"^ "ttgttttc tgttttttga gataaggact cactctatcl 
4801 £ctt£2S tetSSSS ' ggcgtgatc a c9gctcact gcagactcaa gtgatcctcc 
4861 ttttcttftc tttttt^t? ^??^ aCCaC aggcgtgtgc catcacacca ggctaattat 
4921 gcaataatac « ^ ttttgagacg gagtttcgct ctttttgccc aggctggagt 
4981 IctcaSc ctaS?^ cacca * aacc tctgcctcct gaattcaaac gaatctcctg 
5041 atatt?t^n rt^l " gggattaca 9 gcatgcgcca ccacgccggc taattttttt 
5101 ?caaatoat? „ S 9 g tttctccat gttggccagg ctggtctcga actcccgacc 
5161 accclgccta tttSSS-? ??^ tcccaa a gtgctggga tcactgacct gagccalcgc 
5221 tcttaS 3 ^^"5 3at " ttcaca « a 3atgaggtct tgctatgttg cccacactgg 
5281 LoS' ctgggctcaa gtgatcttcc tgccttggtc tcccagtgtt gggattatag 
534^ ?a?tacctg a ctoaatcaa 9 ^? 9 " gtCc tttctggggt gattagalgt fgggaccatg 
5401 atcatcaafa rr^ 9 ^ 9 atta t aaa « cctatggtca ctgtcctggc aaaacatgga 
5461 caaaaaa? 3 ? ™£ ° cagagtgca 9 "aataacca ggaagtaagc aagagaaaga 
5521 t 3339 ^, 99c a 9tcaaa acagatttga caggccaagt cagatcctcc tctgaacgag 
SSRi ^ 39a ? 9 f aaataa agac aggattgcca taatgcctct gtgctaaaag cttatettqt 
564? acacagtctt SSSSS ~ C " C399t cttgagtaag ^Ittgctga catcaccc?c 
5701 ttoctcctaa ^™ ^ ttctaaccct gtgttagaag cagtaacaca gaagatttag 
^7fii g ? ca 9 ca 9 t 999 agctattgtc taagagatac aaaggagaaa aaagtatacc 

llll o?. a ^ aa9t ? atatcacct ctggggctgc caccalatca cctSctaSJ c^fgagggg 
58B1 a taga = aa 9tt ccaaatcttt tgcaaattaa acaaccccag gtcagglgfg 

5941 caggaatttQ aaaccaac^ Cagcactttg 9ggggctgag gtgggtgga? laccgagg? 
finni I 39 ? 39 : 9 a 9 acca 9cct ggccaacaga gcaaaacccc atctctacta aacaaaatac 
till 33 o 33 ^ 3aC Caggcgtagt 9gtgtgcacc tgtagtccca gctacttggj aggctgaggc 
6121 n 3 ™ 3 ^ 9 C " gagtcca 9gaggccgaa gttgcagtaa Iccgagaicg cgcca^tgca 
till tacaalaacl tgagactcca "tcaaaaaa Laaalcaaf alaagcclat 

6241 agtatct a t a aaatat^ 33 aacaacgaat taaacaaccc caaagattgc acaaatttca 
6301 cattacaaa 3 - l?al»t? I c f a 9 a aagcc tggcccatgg acatttttca acagcatctc 
6361 ccct?c a aaa LS« gtgagtcaca caggcatggc tgagtcccac taatgcacat 
64P1 9 ? tactctccaa tcaccagccc caggtgccca ctcaagccca gctcttagtg 

till tflcgtaccc IttllltTc- ? CaCttCCac tcctaccaca cagggLgag ccacacccS 
6541 tgtclc a tet «c S ^ gg = agcatt a ttttgagag ccttcgcttt actgcacgtc 
6601 tfalcactat tggtccatga gcccctggtg ggaactttgt ctctggtaac 

6661 t aa ccaaa?t ZlZfnZl^ ^ ggacaag 9t gtctggagaa aaacaaactc ctccctggga 
67?i ItZalZt 9 . ccca 99 a ttc tagaaggtta gttttgcaaa cctttaaaga agggattttc 
6721 atcaaggggc ccacagatcc ttcattgagg tttatgagtc ccacatcala glffgggtgt 
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2} """^^ cagactctct taaagtccat gatcctaaaa cagttaagaa ctaatgctgt 
6901 t^tgggtca aagccacagg gaacctgcca tgtggatjct gcagcggggt 

m« !=? 9 f tCagC ca 99ccgcct ttcaccgtgt tctgttttcc ctcccagctt tagctccgcc 
70^ ™S aaC acccattaaa cgcacttctc attttcctca tcataaLtc tglgtjgggt 
7021 gggagcaaag gcccgctgga tcagctagag aaaggagggg aaactgctca gtctgcagat 
7081 ccccagtggg agcagttaaa taacaaaaac ctgagcltgc ctcttltccc Sc q"? 
7141 cacaaggaaa acaccgtcac caacgactgg attclagagg gggaggagga cgacgactat 
7261 t^a^r^ 9 agaagaCatc "gtgaagac gacgactaca EcgacItJgt cgacfgtctg 
7261 tcagtttccc cgacagactc tgatgtgagt gctgggaaca tcctccagct ttttcatggc 
7381 SLT tcca 9 c ?tct taacatcctc aacgccaagt tcgctttlaa cctctaccga 
llll ? tgctgaaa 9 accaggtcaa cactttcgat aacatcttca tagcacccgt tggcatttct 
1501 att?tocat? ?^f 9att ^ cttaggtct 9 aagggagaga cccatgaala ag?gcactcg 
llll aatctettrr ^S^" ^ gttaat 9 cc agcagcaagt atgaaatcac gaccattcaE 
7691 ?-» I f gtaagctgac tcatcgcctc ttcaggagga attttgggta cacactgcgg 
llll gtafgagao? ESS??* ^agaagcag "tccaatcc cgcttgfctt caaaacSal 
7741 ?S a K attactttgc tgaggcccag atagctgact tctcagaccc tgccttcata 
1B01 aatataaacc nr 3 ^ 30 ^ catgaa ^ ctc accaagggcc tcataaaaga tgctctggag 
7861 r a r a r a ? a ^f "?ctaccca gatgatgatt ctcaactgca tctacttcaa aggtaagagg 
7921 ttlrlt^ g"ctcacag caaacccaca acatactatt tttgtatgtg gl?aga?tgl 
7981 tan™™ 3 - ctgtactgta aetataattt atccaggaaa actagacicl Igat?gac?c 
llll ctltctllar a "?? 9aagg ccaa 9 c tgaa gtgacagtag canctgacac t?actjagcc 
8101 gctttaacac agccttgtga ggtcatcact gttattagca cccccatEtt 

8101 acagaggaag ccaccaacac atgaagtaaa aggatgggct gggcgcggta octcacorr? 
2S ?«aS Cactttggga ^gccgaggca ggcagaffac ?S gSttcgaga 
8281 o S^ «acagacca acatggtgaa aacctggctc tactaaaaal IcaaaaatEa 
8341 gc 9Stgggtg cctgtactcc cagctacttg ggaggctgag gcaggagaat 

8401 laalla^ H?" 890 ? 9 agattgca 9t gagccgagac tgtgccactg LctStlgcc 
8401 tggacgacag agcgagactc catctcaaaa aaaaaaaaaa aagaagtaaa acgatgctcc 
852? ag " attaag ggscagagcc aaagctgaac ccfggSaggc calccctagc 

8581 a f a "?gaagaa ataatacaaa aactgtttta gcatttggcc agcctggalt 

llll glaatacacc ? f ^ cccaattatc aataagcagg aatatagaca aaagglLaa 
8701 ™»?= a ^ t?tgaactat tcagcttgag cagctgacat tgacacctac aagtgctttt 
8761 ttgaactact gggcaggtgg gatggagaaa taaattacta tttclccagc 

8821 acacItS SZT 9 ^ aagggcactt tttaaggagg tcaccccaca cccatcacac 
8881 ctcttttela SEES!" c " aggaata aataagcatg gacttgtaaa atccaaacct 
8941 atatcctcac ctggaccaga ccagaagaaa cctctacttt actctctaag 

9001 acfclacclc ?? 33 ^??? 33 acacgaggaa tggttcggct tcaggactaa ttgcggtgac 
Qnfii ^ * ttctctttgc caccaaggac taccaggtac ctgcaaaggg cagtacttgg 
llll a af^ a ^ 9C tttctgctag ttagctcccg tggttttata gclgccclgg cgLgJaagg 
9m g^cacacae ^rT? 9 C " ctgttca g99aaagggg gccagagccc clccEgat™ 
9241 ?nn^=f a ctgctctgt 9 ccttggctga ggcccctgca gctctacaag gcaggcattc 
9101 altSaS ?" aa ? Ca9 ? gtcactct 9 a cacccaggtt tccaccccai ggcalggcac 
9161 " tc ccgtgggt ggaatcaaag gctgagttct aacaggcttg cggcagacac 

llll tacaagt? 3 ? oaa^f 9 ' 3 Catgatgaac acacatatcc ttttcattac aggttattag 
9481 tat aa o^= ggaatt 9 a 9= aaacaagagt ctaagcgctg gtttcaccac ttctcgtttg 
9541 sacaagtcat tcaacatctc tatgactcag tttccttatc tttattacag 

llil a ?"? a "" cactct g a =a gggccgaggg aagaaccata agcgatggca atgcaacagl 
llll farl*^ f 9 acaaga 9 ctc agcgaatttg agggaatgaa actftagatt acaatactlg 
9?21 a^ aa ^^? a taaacatatg atattgttag tgacatttat tttactEcta ctagcaaata 
97^1 a r^ a ^!^ a 99 act gact ttagaacagg ctggcagaag catttttggc agcatcaaag 
llll ^^of tac tggtctg ttggagcccc ccaagtacac caaagagcct ctgcattagl 
llil Sgtaaac llltl^ Caggcagaga agtacagcag tgaglcatcc ctgcctgcft 
9961 ?Sa™ aaatgatcag gcatggtcag ttgacaatct cctaaacaca gtaacccgtg 
10021 agtcactcaa SSSS^S acgtgcaaat gcttctgctt cctttcccca Ecatgagaa? 
10081 a?acttctr a «^ 9 " Cat cacaag 99at caaatgctag gagtacccaa tcattcatgg 
10141 aaggggacga gtgtctagaa gtgtaatttt aatttcactc aatttcatat 

1S2SI r« aa == ccatta ctaa ttttgttcta attttaatgt gataatcact ttgtaaagca 

^261 ttrtf, aC39 aggca 99Ctc tcatgaggaa gtcagaagga aagaatccca agagacalgg 
iSISi Ictglagaca cttttant?" 3agggcCgtg attcccaaaa gagcaatttt g?ccccaagg 
T^oT ccc 9 aa 9 aca cttttggttg tcacaacctg gggggttgga gtaaacatta ctoatatcta 

10JS SS a^ taSaCa c « ?a = c " g cacSggcag ccflcattgc 

10501 caoact^on Ztnlt gg C aaatgtcaaa aatgctgagg tcgagaaacc ctgggtgagg 

insfii f 3 ? 8 f agg 9 a 9aagggaa tcgagcttca ctcacaggca ggcaggagct gtctggtact 

iS5i Staa™ =? a f a ^ tCC tgctcat «c atcctggltg clctlcclac lag^Igaaa 

10621 ccttgaacaa gttacttcac ttctttgtgc ctctgtttcc tcatatgtaa aagagggata 
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10681 
10741 
10801 
10861 
10921 
10981 
11041 
11101 
11161 
11221 
11281 
11341 
11401 
11461 
11521 
11581 
11641 
11701 
11761 
11821 
11881 
11941 
12001 
12061 
12121 
12181 
12241 
12301 
12361 
12421 
12481 
12541 
12601 
12661 
12721 
12781 
12841 
12901 
12961 
13021 
13081 
13141 
13201 
13261 
13321 
13381 
13441 
13501 
13561 
13621 
13681 
13741 
13801 
13861 
13921 
13981 
14041 
14101 
14161 
14221 
14281 
14341 
14401 
14461 
14521 



acaaaacgca 
ctgagaagaa 
tactcacatc 
catatctaca 
tacagaatca 
ctcatgccta 
ggtttgagac 
aaagacagaa 
gggaggctga 
tggcatcgcc 
taataacagt 
tgaagaaata 
cctactccag 
caaatttctg 
tatctgaatg 
aaggaacctt 
tcccagtgga 
tttccatgat 
acatcctcca 
tgtctgggat 
aaagcatgac 

ggggtgtctg 

gtacccaaga 
ataagagatg 
gaagttagag 
gcttcatcat 
acagaatcag 
aagattggct 
cacatcattc 
taagtactta 
gtcttactgg 
aggacaagaa 
tttgcccggg 
acctgcctcc 
tttttttttt 
catcatcatg 
ctcctgaata 
ggtagagatg 
tcctcctgcc 
ccatttgact 
agaagatcaa 
gcaaagtgcc 
gtaagttgag 
tttaaaagta 
aaattgggaa 
aagttctgct 
tgcaacagaa 
aatcctcaac 
aaatataacc 
taaaactggg 
caagctggag 
gtttgacaaa 
aaccactccc 
ccacttgccc 
tcttcggcct 
ggcacctggc 
cactcccgct 
ccagccaaat 
aatcgggtcg 
ccccatcccg 
ccagccccca 
ctctaagtgc 
ccagctgtga 
tgtgctggga 
aaacagttca 



cacaacttgc 
tgcccggcac 
ttagagctaa 
gtggtgatcc 
cagtgtgagg 
taatcccagt 
cagcctaggt 
agaaaaaata 
ggcaggagga 
gcactccagc 
aataaaagct 
gaagcgagtt 
aaactattcc 
cccaaatcag 
aggcctccag 
ctcataacag 
aatgacacac 
gcagaccaag 
gctggaatac 
gaagaccctc 
aaacaggtat 
ggaatactgg 
acttccatac 
attagagagc 
gcagatgact 
ccctaaaatg 
cgatgctgag 
caactcttcc 
atgatttcct 
ttgagattat 
atactggcta 
tacaaacata 
tagccagtca 
ttccattcct 
ttttttttga 
gctcactgca 
gttgagacta 
aggtctcgct 
ttggccttcc 
tttaattgag 
gccttcctgc 
agactaactc 
gcaaagattg 
cactaccaga 
accaaaccag 
gctaaccttg 
aacacacctc 
tgacagtccc 
cgtggccctt 
cccccctttc 
aagaactaca 
aatggcaaca 
ttgtccaccc 
ttcctaccca 
gggtgggata 
agacacttac 
gacaccagag 
catgaaagag 
ctcagcaaaa 
gagaagtgcg 
cgaccctcag 
aacggctgcc 
tttccacctt 
actctagccc 
agcaccaagg 



atgttgctag 
atggccagtt 
catagacatg 
taagggcaac 
gatgaaggcc 
gctttggaag 
aacatagcaa 
gccaggcgtg 
ttccttgagc 
ctgcatgaca 
ggaaagagct 
aggtgcctta 
agtccgggta 
gcctcaggaa 
ggaaatcaga 
cctcttcctg 
aaccacaact 
gggaacttcc 
gtggggggca 
gaagcgcaac 
ttcacactgt 
aaaatggatc 
agggccactc 
attcataagg 
tagagacagc 
ggtataattc 
cgcccctccc 
ctgcccagga 
ctattattat 
tattgggtca 
ggcccatatg 
tgcaaccaaa 
tcatgctctg 
ccctgcagcc 
gacagggtct 
gcctcaacct 
caggcgtgca 
gtgttgccca 
aaagtgctgg 
atcttacttg 
ccatccagct 
cacaggcact 
agatattcag 
tattcgactc 
agaattattt 
aagataggaa 
agttttcagt 
ggaatataaa 
taaagggaaa 
cttttctgtc 
atctagtgga 
tggcaggcat 
ccgacccgtc 
ccccccaatc 
cacagaatgc 
tgggcagggg 
acaggggaga 
ccattaaaca 
gagagagaac 
cagcagtgtg 
accacaggca 
cctgacaggt 
acatgttgtc 
tctgtgtgct 
cacgatcaca 



gagcagaaat 
ctcaactact 
ggcttattcc 
atggcatcac 
atcaagacag 
gctgaggcag 
gaccccatct 
gcatgtgctt 
ctgggagtgt 
cagtgagacc 
caaagttact 
ccatggtcaa 
acctctcgtt 
tcaagagact 
ttcactctca 
tggcctttac 
tccggctgaa 
tcgcagcaaa 
tcagcatgct 
tgacaccccg 
gtgtttgttc 
atttttttaa 
tgttaattca 
gacacatctg 
ttggtgcttg 
cattacttcc 
agtacttgga 
aattccaagg 
tcgttacttt 
tggcagaaag 
aagaagtgat 
ctgagaaaag 
tgaatttttc 
cggcagctct 
tgttctgtca 
cctgaactta 
ccttcatgcc 
ggctggtctt 
gattaacagg 
gtgcaaggta 
gggattgcac 
actgttgcta 
cattgtctag 
cttaattaca 
tagatgcctt 
acgaaccata 
agcggaatta 
ttttaataag 
atcatgattc 
tagaactcga 
gtccctgaag 
ctcagaccaa 
cccagggtct 
tcatgtccca 
ctagtttcat 
ggatcccaag 
catgtgctgc 
ccgcactata 
accagtccaa 
gggagctgga 
ctgccaagag 
ggtgacagat 
tttggatcct 
gacctccaga 
gtgaacgagg 



gagataatac 
agtcacccat 
tggatacaca 
ccaaatgtct 
agctgaggct 
gaggattgct 
acaattaaaa 
gtagtccaag 
gaggctgcag 
tggtctcaaa 
catttgacag 
acaactagtt 
aacctctctt 
gtggggtcgg 
agggtgagac 
aggatcctgg 
tgagagagag 
tgaccaggag 
aattgtggtc 
ggtggtggag 
ttttgagctc 
aaagggagaa 
gccccaattt 
ccctctaggg 
ctttgtggct 
ccgggtcact 
acctaggagg 
tcctcttagc 
gtagttaaaa 
aatggagagg 
tctggtttga 
taggctctca 
cttaacaacg 
tgagaaaggg 
cccaggctgg 
agtgatcctc 
cagctaatta 
gaactcctgg 
cgtgagccgc 
tgagctaggt 
cttaaatctc 
tccgccccct 
tatatacagg 
aaaaaaaaac 
tttaaaccat 
cagtctcaag 
caaaggagtg 
tgctatatca 
ttttgtaact 
gaagtgcttc 
ttgatgggga 
aggatcgcca 
gcctcagcac 
gcttggggtg 
ggatgccagc 
agcagccatg 
ggtctgggaa 
caacatactt 
acagtgcagc 
gctggggtgg 
ggaacatgaa 
attttcaaga 
ttccctgaat 
atctgacaac 
aaggcaccca 



aggaaaggtg 
tactattagt 
gcactgtccc 
tgttagtcac 
ggcagggtgg 
tgaggccaag 
aaaaaaaaaa 
ctactgggga 
tgagctatga 
aaccaaataa 
atgtgacaga 
cgtatcagac 
gttagaaatg 
ctctgcaggc 
gatttcccta 
gtgaataaat 
gtagttaagg 
ctggactgcg 
ccacacaaga 
agatggcaaa 
ccagatgctg 
ttatgtacaa 
gttgcttgag 
gccagtttca 
tcgagtccca 
tgagaaaata 
cactcaaaaa 
ctaccgagga 
ctgcaggtgt 
tcttatttct 
acctccttat 
gaggaaggta 
tcccttctgt 
actgcatctt 
agtgcagtgg 
tcacctcagc 
aacttttttt 
cctcaagcag 
tgtgcctggc 
aaaagagtga 
tttatcccct 
tagggattga 
aaaggttctt 
caaatgccta 
aaaccaggaa 
gaaataatca 
tgcttcctaa 
attctgtgat 
tgtggttcaa 
tgccgaaatt 
tcaggatgct 
tcgacctggt 
agccccacct 
ctgagtctgc 
tggagagcac 
gggtgagccc 
atagctaccc 
aacttaaacc 
agacccagtt 
ctgtcctgca 
cctagccggc 
gtgactctga 
gatatgagat 
tttcctttcc 
agccaccact 
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14581 gtgaccacgg tggggttcat gccgctgtcc 
14641 tttcttttcc tcatctacga gcatcgcacc 
14701 aaccccagca ggtcctagag gtggaggtct 
14761 attttgtttc cattccaaca acgagaacag 
14821 gctaccaatc tgaattcgag gcccatatga 
14881 cttgttggaa tcaattctgc acaatagccc 
14941 tgtagtgtgt ctgctgttac ctagagggtc 
15001 gcagcgcgtc ctaagcacct cccgctccgg 
15061 actcaagcct ttctccacca ggcccctcat 
15121 gactaattcc ttacctctcc caaggagggt 
15181 gaagaagcca cctcaagaca tatgaggggt 
15241 tcaaagcctg acctttcaaa tccatgatga 
15301 ctgtgacctg gaggacagtg tgtgccatgt 
15361 acatttactg tgtatctgtt ataattctct 
15421 atccaaattc ctggataact ccaggtatga 
15481 acaatgtgcc acagcagggc atgttctcag 
15541 agggtctgtg cagtacccca gaactgtggg 
15601 ccacagtcta tgccaggctg ctgcagcttt 
15661 tggtttgaca gagcagatga cacctgagga 
15721 aagacaagtg aaatccacag aggctgttca 
15781 aggggatgac tgacggtcac aggtgctgtg 
15841 ctggcagat 



acccaagtcc gcttcactgt cgaccgcccc 
agctgcctgc tcttcatggg aagagtggcc 
aggtgtctga agtgccttgg gggcaccctc 
agatgttctg gcatcattta cgtagtttac 
gaggagctta gaaacgacca agaagagagg 
atgctgtaag ctcatagaag tcactgtaac 
tcacctcccc actcttcaca gcaaacctga 
tgaccccatc cttgcacacc tgactctgtc 
ctgaatacca agcacagaaa tgagtggtgt 
acacaactag caccattctt gatgtccagg 
gccctgggct aatgttaggg cttaattttc 
atgccatcag tccctcctgc tgttgcctcc 
ctcccatact agagataaat aaatgtagcc 
attttttgaa gctcaaatat caaaagccaa 
taaaggctga gaggaagtca cttgagcacc 
gacaggacag gtgtgtgctg aatcctgggg 
gtgctaagtg gcacacaagc cccagggctc 
catccctcat acctggtcct gcagtgggtc 
atatgtttct ggatccttca atccctgggt 
gcacgcaaga gtgccagtgc tctttcagtg 
tgtgcaggtg tctaactgta accccacagc 
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LOCUS HUMTHRR 3472 bp mRNA prt in nr-r 

DEFINITION Human thrombin receptor mRNA. complete cds 10-OCT-1991 

ACCESSION M62424 complete cas. 

NID g339676 

KEYWORDS thrombin receptor. 

SOURCE Human DNA. 

ORGANISM Homo sapiens 

^f££? e; mi 5° chon <^ial eukaryotes; Metazoa; Chordata; 

REFERENCE T^Ssl^i to^,'" Cata ~ hini <- Hominidae; Homo. 

Sf S Moie C ui^ H M 9 ' D - T -', Whe f t0n ' V - 1 - Coughlin.S.R. 
novel Molecular cloning of a functional thrombin receptor reveals a 

proteolytic mechanism of receptor activation 
JOURNAL Cell 64. 1057-1068 (1991) activation 
MEDLINE 91168254 
FEATURES Location/Qualif iers 

source 1..3472 

/ organi sm= " Homo sapi ens • 
/db_xref = ■ taxon : 9606 ■ 
CDS 225.. 1502 

/codon_start=l 

/product= 'thrombin receptor" 
/ db_xre f = " PID : g3 3 9 6 77 • 

/translations "MGPRRLLLVAACFSLCGPLLSARTRARRPESKATNATLDPRSFL 
LRNPNDKYEPFWEDEEKNESGLTEYRLVSINKSSPLQKQLPAFISEDASGYLTSSWLT 
LFVPSVYrGVFWSLPI^IMAIVWILKMKVKKPAVVYMLHL^ 

YYFSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFIAVVYPMQSLSWRTLGR 

ASFTCIAIWAIjAIAGWPLVLKEOTIQVPGLOTTTCHDVIjNETLLEGYYAYYFSAFSA 

VFFFVPLIISWCWSIIRCLSSSAVANRSKKSRALFLSAAVFCIFIICFGPTNVLLI 

AHYSFLSHTSTTEAAYFAYLLCVCVSSISSCIDPLIYYYASSECQRYVYSILCCKESS 
„,„ DPSSYNSSGOLMASKMDTCSSNLNNSIYKKLLT * 

BAS^COUNT 933 a 817 c 785 g 937 t 

si SJSS 080 ga = cgcgcgc cccagtcccg ccccgccccg ctaaccgccc cagacacagc 
121 IZZllZZl™ SS? CttW accctgat « tacccgtggg caccctgcgc tctgcctgcc 
181 ^^ CCCgaC ccgca 9 aa 9t caggagagag ggtgaagcgg agcagcccga 

241 cctcc =35ag cagcgccgcg cagagcccgg gacaatgggg ccgcggcggc 

111 I 9 " ' 93<=cgcctgc ttcagtctgt gcggcccgct gttgtctgcc cgcalccglg 
361 a«»!™f a 9 aat «=aaaa gcaacaaatg ccaccttaga tccccggtca tttcttctca 
421 Ilactaaata ««ccatttt gggaggatga ggagaaaaat gaaagtgggt 

ill SfSSfSB^ cagattagtc tccatcaata aaagcagtcc tcttcaaaaa caacttcctg 
111 ^1* " C agaagatgcc tccggatatt tgaccagctc ctggctgaca ctctttgtcc 
lol EEJSE? caccggagtg tttgtagtca gcctcccact aaacatcatg gccatcgttg 
661 ^ ' Saaaatgaag gtcaagaagc cggcggtggt gtacatgctg cacctggccl 
721 SEE??™? g "gtttgtg tctgtgctcc cctttaagat cagctattac ttttccggca 
781 ?«a acc ?»^?? 9t f gaattgtgtc gcttcgtcac tgcagcattt tactgtiaca 
111 »?^™ tatcttgctc atgacagtca taagcattga ccggtttctg gctgtggtgt 
901 S «tccctctcc tggcgtactc tgggaagggc ttccttcact tgtStigcca 
III ggccatcgca ggggtagtgc ctctcgtcct caaggagcaa accatccagg 

102i K ^ a ^ CaCt acct g tcat g atgtgctcaa tgaaaccctg ctcgaaggc? 
1021 actatgccta ctacttctca gccttctctg ctgtcttctt ttttgtgccg ctgatcattt 
llll SoJ? "atgtgtct atcattcgat gtcttagctc ttccgclgtt gccaaclgca 
120i tSSSSSS ccggg " ttg ttcctgtcag ctgctgtttt ctgcatcttc Itcatttgct 
llll S222S£ aaacg 5f ctc "gattgcgc attactcatt cctttctcac acttccacca 
1261 cagaggctgc ctactttgcc tacctcctct gtgtctgtgt cagcagcata agctcgtgca 
1321 tcgaccccct aatttactat tacgcttcct ctgagtgcla galgtfcgtc tacagcafct 
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1381 tatgctgcaa agaaagttcc gatcccagca 
1441 gtaaaatgga tacctgctct agtaacctga 
1501 aggaaaaggg actgctggga ggttaaaaag 
1561 ttctattagt ccccacccaa actttattga 
1621 tgcatacctg ctttttatgg gagctgtcaa 
1681 aacaggacga gatgacggtg ttattccaag 
1741 aatgtcactt ctggatatag ctaggtgaca 
1801 tgtatgcaca cacatatatt atttgcagtg 
1861 ttccccgcac cccagcaatt atgaaaataa 
1921 ctaggttggt agagtttagc cctgaacatt 
1981 atagtttggg cttgtaccac ttttgcaaat 
2041 gtttaagtta ttaagaggta agacttagta 
2101 aattttaaac atatccaagt ttgaattcct 
2161 ttttgatatg ggtagtattt tttacatttt 
2221 ataagtcctc tagtgaatgt aggctggctt 
2281 tgtccgcccc cgatggagga ctccaggcag 
2341 gattggccag aaaccttcct gctgagcctc 
2401 ctccatcctc ctgggattgg ctgtgaactg 
2461 atgtgatatc ctaggaggta atgaccatga 
2521 aaagaaggca tggacttctg gatgcccatc 
2581 ctgaaatgtc agttctgata tggaagcacc 
2641 ctgagtgtac agagtggaat aagacagaga 
2701 tagagtgtga tgtatgtgta ataaatatgt 
2761 agtttgaaca tttgggttac tatttcttgt 
2821 aggacatata ttttttaaaa taagtctgat 
2881 ttgctcaata gattgctcaa atcaggtttt 
2941 agaaataaca gaagaaaata gaattgacat 
3001 catttactta agacttaatg agactttaaa 
3061 tagaaaatct tcatggaatt cacaaagtaa 
3121 tcttacgaaa aaatggtagc attttaaaca 
3181 taaaagagca ggccaggcgc ggtggctcac 
3241 ggcgggtgga tcacgaggtc aggagatcga 
3301 ctctactaaa aatgcaaaaa aaattagccg 
3361 tactcgggag gctgaggcag gagactggcg 
3421 cgagatcgcg ccactgtgct ccagcctggg 



gttataacag cagtgggcag ttgatggcaa 
ataacagcat atacaaaaag ctgttaactt 
aaaagtttat aaaagtgaat aacctgagga 
ttcacctcct aaaacaacag atgtacgact 
gcatgtattt ttgtcaatta ccagaaagat 
ggaatattgc caatgctaca gtaataaatg 
tatacatact tacatgtgtg tatatgtaga 
cagtatagaa taggcacttt aaaacactct 
tctctgattc cctgatttaa catgcaaagt 
tcatggtgtt catcaacagt gagagactcc 
aagtgtattt tgaaattgtt tgacggcaag 
ctatctgtgc gtagaagttc tagtgttttc 
aaaattatgg aaacagatga aaagcctctg 
acacactgta cacataagcc aaaactgagc 
tcagagtagg ctattcctga gagctgcatg 
cagacacatg ccagggccat gtcagacaca 
acagcagtga gactggggcc actacatttg 
atcatgttta tgagaaactg gcaaagcaga 
aagacttctc tacccatctt aaaaacaacg 
cactgggtgt aaacacatct agtagttgtt 
cattatgcgc tgtggccact ccaataggtg 
cctgccctca agagcaaagc agatcatgca 
ttcacacaaa caaggcctgt cagctaaaga 
ggttataact taatgaaaac aatgcagtac 
ttaattgggc actatttatt tacaaatgtt 
cttttaagaa tcaatcatgt cagtctgctt 
tgaaatctag gaaaattatt ctataatttc 
agcatttttt aacctcctaa gtatcaagta 
tttggaaatt aggttgaaac atatctctta 
aaatagaaag ttgcaaggca aatgtttatt 
gcctgtaatc ccagcacttt gggaggctga 
gaccatcctg gctaacacgg tgaaacccgt 
ggcgtggtgg caggcacctg tagtcccagc 
tgaacccagg aggcggacct cgtagtgagc 
caacagagca agactccatc tc 
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HUMLPLFI 3877 bp DNA PRI 07-JAN-1995 

H sapiens lipoprotein lipase (LPL) gene, exons 7,8, and 9, and an 
Alu repetative element. 
M76722 M76723 
gl87215 

Alu repeat; lipoprotein lipase; plasma protein. 
Homo sapiens blood DNA. 
Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 3877) 

Chuat,J.C, Raisonnier,A. , Etienne # J. and Galibert.F. 
The lipoprotein lipase-encoding human gene: sequence from intron-6 
to mtron-9 and presence in intron-7 of a 40-million-year-old Alu 
sequence 

Gene 110 (2), 257-261 (1992) 
92165069 

Location/Qualifiers 
1. .3877 

/organism= "Homo sapiens" 
/db_xr e f = ■ taxon : 9 6 0 6 ■ 
/cell_type= " lymphocyte " 
/tissue_type= B blood" 
/map="8p22 B 
1. .198 
/partial 
/ gene =■ LPL" 
/note="G00-120-700" 
/number* 6 

join (199. .319,1840. .2022,3052. .3156) 
/partial 
/genes "LPL" 
/codon_start=3 
/db_xref="GDB:G00-120-70O" 
/product^" lipoprotein lipase" 
/ db_xr e f = " PID : g5 5 3 5 2 3 " 

/translations • FHYQVKIHFSGTESETHTNQAFEISLYGTVAESENIPFTLPEVS 

TNKTYSFLIYTEVDIGELIJ^KLKV^SDSYFSWSDWWSSPGFAIQKIRVKAGETQKKV 
IFCSREKVSHLQKGKAPAVFVKCHDKSLNKKSG " 
exon 199.. 319 

/gene="LPL" 
/note="G00-120-700" 
/numbers 7 

^ene join(199. .319, 1840. .2022,3052. .3156) 

/gene="LPL" 
intron 320.. 1839 

/genes "LPL" 

/note="G00-120-700" 

/number =7 

repeat_region complement (746 . .1027) 

/gene="LPL" 

/note="G00-120-700" 

/rpt_family="Alu repeat" 
exon 1840.. 2022 

/gene="LPL" 

/note=°G00-120-700" 

/number= 8 
intron 2023.. 3051 

/genes "LPL • 

/notes "GOO-120-700" 

/numbers 8 
exon 3052.. 3156 
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/gene="LPL" 

/note="stop codon <tga) is interrupted by intron 9 
between tg and a; G00-120-700" ' 
/number =9 
intron 3157.. 3877 

/partial 
/gene=°LPL" 

/note="GOO-120-700' 
/number =9 

BAS^COUNT 1145 a 787 c 746 g 1199t 

eJ Icaccagtga ttcfatat.r c ' a W tat *» acactgtgca tgatgaagtc tttccaagcc 
121 ttctqaat? 9 - c t^rS? gtgcacttcc ggtttgagtg ctagtgagat acttctgfgg 
181 gaatftcctc cccaacaatc t?S?^f 9 ataCtttcac aaagattgat caacatgt?? 
241 gtgaaaccca taccaatcag gcctttgagl ttfctct^a , tCatttttct ^ctgaga 
301 agaacatccc attcactcto toaataf 3 ? 3 " tctct 3 ta tggcaccgtg gccgagagtg 

5S SS a™ ™ SS52K S£2S SSJ 

^ SSS2S ™ ™ 39 "™ SHE 

S2 SSS3 ™ aa 3SSS =SSS SSSS 

721 ttttttttt? tttt?t a t a t " aagCagta agaagtccat gacaaagtgt tagctc?ttt 
781 gcaatoatte "tttgagat ggagtctctc tctattgccc aggctggagt 

841 tctcagcctc Icgagtaact 'ctacctccc gagtccaaac aa?tc??c?g 

901 tatt a tra^ f 9 ggggctgcag gtgcccacca ccatgcccag ctaatttttg 
961 cagjtgatcc alccocS ^caccatg ttggccaagc tggtcttgaa ttcctgatc? 
1021 ccllgcctac cctt?acta? Seaa* 33 gtgctgg g at tacaggtgtg agccaccatg 
1081 aattactaga tgaacaaatc ttt™,? 3 aataaaa g ta aggcaacttg atacttttac 
1141 atgcgaacct accatglatc attract* gcca9t g ca g acaaggtggt gaagcagaac 
1201 taataacttt ccataactar ™?f?? C 5 agaaccctcc aggtgcggaa ggtagtattt 
1261 tttatcctaa aaaS^ aaaatattat tacatagaag ggagtgattt ttttctaata 
1321 actagcataa aSaalcc "taaaaaca tcaattacag tcgtacctat 

1381 ttatcaaatc atta* 333 ^ cag 5 atccaa cattgaggca gtgggtaaat gaatcgtggt 
1441 aacatag?aa aaaatggaat ataaaltcto taaaaa " at aattgtagga aacccaggfa 
1501 tgctatgatt gtagSat aatat^aa^ aa ? agaataa agaatagaga atcgtatgtg 
1561 aaataaMAt ? aatgttcaag tatcaacaca aattgaaaag gaatacatga 

1621 taat?ca? a t " C 9 aatgattgac ttcaggattt tcttttagaa Etgtattaaa 

1681 ca?cgalc?c cattt?^ a " gctggaa tgtggatata atttaaaata tactaaatgc 
nil aaalaataaa SS?tE£ tttacaaall acatttttgt flcattttt.. aatatccclt 
1801 atctctataa Itaaccaaat tta^act?? T^?*** gtggggggca ggg a gagctg 
1861 gacctactcc ttcctaaT^ llt^l 5 I tttgtttagg cctgaagttt ccacaaataa 
1921 laaatggaag agtgattcat acttta^^ agatattgga gaactactca tgttgaagct 
1981 cattcaaaaa atcfaZn?*. acttta g ct g gtcagactgg tggagcagtc ccggcttcgc 
2041 cttcctlca? tttaS^ 33 aagcagga g a gactcagaaa aagtaattaa atgtattttt 
2101 tcalaattcl aacaaaSn^ ^ acctgat g t caggacctag gggctgtatt tcaggggcct 
2161 tttaggagtl ttSSSS t?2?S??S tgtatttatt actgtatgat gta£ttttc 
2221 ttfltZtttcS tgtaaggaaa acataaaccc ^!!?" C ? 9 9gggggaagt gacagtattt 
2281 aoaattaoaa 8 acataagccc tgaatcgctc acagttattc agtgagagct 

2341 %E£gl £2£££ agalactia a " tggcact g"tcttgta agtfcfaaat 
2401 ccagallggt gagattccal oltaft^^ t9gataa ! ca aagattcaaa ccaacctctt 
2461 ccataaaatn a*; 8 *- Qataatctca acctgtctcc gcagccccac ccatgtgtac 
2521 gggattttac L ^ agatcgctat aggatttaaa gcttttatac taaa?g?gct 

llll JSS2S SSSSS SSKSS t t9 ta aa " ta "*4&t 

"S clatgcctag " 99993?gg 2K£SS £252 

2161 g 3 tgltct a ? t^ca??^? atg * cctaat tccttaatgc agatgctaag agatggcaga 
2821 gca?gcct5t ctatctaaat cca ^ ta ^ aagactgctc taggllgtct 

2881 ttaccctc?g attctaa^ ^ aac tagctt ggttgctgaa caccaggtta ggctctcaaa 

2941 at'acccagl atgat?atgt SSSS32L £1?*?™ ttattgggaa tatcaaaaca 
3001 agtgctttta attatrr^ attatttaaa cagtcctgac agaactgtac ctttgtgaac 
3061 c?g?tctagg gaglaagtgt SUtttaca S 3C3tCC3tt ttcttcca ca gggtga?ctt 
3121 atgccatgic Lgcctctaa ata™ gaaaggaaag gcacctgcgg tatttgtgaa 
3181 ctgggcaEcc tglglttg 93 aggctggtga gcattctggg ctaaagctga 

aau tec cgagcccgca ccctaaggga ggcagcttca tgcattcctc ttcaccccat 
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3241 caccagcagc ttgccctgac tcatgtgatc 
3301 ctgcatatgt atcaaatggg tctgttgctt 
3361 ctcttgtttc tcccagcccg gaccttcaac 
3421 ccttgaacta cccctgaatc ttcacttctc 
3481 tgcagatgcc atctgcagag catgtaacac 
3541 tgcagctctt cccaggatgt attcagggaa 
3601 cacatagttc ttgattctcc aagtgccagc 
3661 ccccaagcac ccattctcaa aaccctcaaa 
3721 gaaactgttc tctcttctat ctccaaacaa 
3781 ggctaatcca tgtggcagct gttagctgca 
3841 ctaagcatgt gaccttcact actcctgttc 



aaagcattca atcagtcttt cttagtcctt 
tatgcaatac ctcctctttt tttctttctc 
ccaggcacac attttaggtt ttattttact 
cttttttctc tactgcgtct ctgctgactt 
aagtttagta gttgccgttc tggctgtggg 
gtaaaaagat ctcactgcat cacctgcagc 
atactccggg acacacagcc aacagggctg 
gctgccaagc aaacagaatg agagttatag 
ctctgtgcct ctttcctacc tgacctttag 
tctttccaga gcgtcagtac tgagaggaca 
tgaattc 
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LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



intron 



HSU59436 182 bp DNA pRj 19-JUN-1996 

Human low-density lipoprotein receptor (Idlr) gene, exon 12, 
partial cds . 
U59436 
gl381233 

human. 

Homo sapiens 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 182) 
Sibul,H. and Metspalu,A. 

A new polymorphism in exon 12 of the human low-density lipoprotein 
receptor (LDLR) gene 
Unpublished 

2 {bases 1 to 182) 
Sibul,H. 

Direct Submission 

Submitted (29-MAY-1996) Hiljar Sibul , Estonian Biocentre, 
Biotechnology, Riia 23, Tartu, Estonia, 2400 
Location/Qualifiers 
1. .182 

/ organ ism= "Homo sapiens* 
/ db_xr e f = - 1 axon : 9 6 0 6 " 
<1. .25 
/gene='ldlr B 
/number =11 
1. .21 

/gene= B ldlr" 
1. .182 

/gene= B ldlr" 
26. .165 
/gene= B ldlr° 
/number=12 

/ product =° low-density lipoprotein receptor" 
<26..>165 
/gene^ldlr'' 
/note= "LDLR" 
/codon_start=3 

/product= B low-density lipoprotein receptor" 
/db_xref=°PID:gl381234 B 

/ translation= - LLSGRLYWVDSKLHSISSIDVNGGNRKTILEDEKRLAHPFSLAV 
FE° 

variation replace ( 45 , • t ■ ) 

/gene= B ldlr" 
/ f requency= ■ 0 . 17 ■ 
primer_bind complement (163 . . 182 ) 

/gene="ldlr B 
intron 166..>182 

/gene=°ldlr" 
/number =12 
a 53 c 44 g 



primer_bind 

gene 

exon 

CDS 



36 



49 t 



BASE COUNT 
ORIGIN 

1 tctccttatc cacttgtgtg tctagatctc ctcagtggcc gcctctactg ggttgactcc 
i 9i f ttcact ccatctcaag catcgatgtc aatgggggca accggaagac catcttggag 
1^1 gatgaaaaga ggctggccca ccccttctcc ttggccgtct ttgaggtgtg gcttacgtac 
loi ga 
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******** 
LOCUS 

DEFINITION 
ACCESSION 
NID 

KEYWORDS 
SOURCE 
ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

DE 



FEATURES 

source 



PRI 



06-OCT-1993 



HSCLA1GNA 2566 bp RNA 
H . sapiens encoding CLA-1 mRNA. 
222555 
g397606 
CLA-1. 
human. 

Homo sapiens 

^tochondrial eukaryotes; Metazoa; Chordata; 
Calvo,D. and Vega, M. A. 

93366811 Chem ' (25) ' 18929 " 18935 d993) 

2 (bases 1 to 2566) 
VEGA,M. 

Direct Submission 

Submitted (15-APR-1993) VEGA M. , HOSPITAL DE LA PRINCESA, UNIDAD 

282of ^ M0LECULAR ' C/ DIE <*> ^ LEON 62, MADRID, MADRID, SPAIN, 

Location/Qualifiers 
1. .2566 

/organism= "Homo sapiens" 
/ db_xr e f = " t axon : 9 6 0 6 ■ 

/cell_type=" promyelocytes" 
/ eel 1_1 ine= " HL6 0 " 

/clorie_lib="HL60 cDNA library, Angel L. Corbi" 
1 . . 69 
70.. 1599 
/codon_start=l 
/produc t = " CliA- 1 " 
/db_xref = ■ PID : g3 976 07 " 

/ 1 rans 1 a t i on= " MGCSAKARWAAGALGVAGLLCAVLGAVMIVMVPSLIKQQVLKNV 

RIDPSSLSFWMWKEIPIPFYLSVYFFDVM^SEILKGEKPQVRERGPYVYRESRHKSN 

ITFNNM)TVSFLEYRTFQFQPSKSHGSESDYIVMPNILVL<y\AVMMENKPMTLKLIMT 

IAFTTLGERAFMNRTVGEIMWGYKDPLVNLINKYFPGMFPFKDKFGLFAELNNSDSGL 

FTVFTGVQNISRIHLVDKWNGLSKVDFWHSDQCNMINGTSGQMWPPFMTPESSLEFYS 

PEACRSMKIlfyKESGVFEGIPTYRFVAPKTLFANGSIYPPNEGFCPCLESGIQNVSTC 

RFSAPIJLSHPHFIJIADPVIAEAVTGLHPNQEAHSLFLDIHPVTGI PMNCSVKLQLSL 

YMKSVAGIGQTGKIEPVVLPLI^AESGAMEGETLHTFYTQLVIJ^KVM^ 

LGCVLLLVPVICQIRSQEKCYLFWSSSKKGSKDKEAIQAYSESLMTSAPKGSVLQEAK 

L " 



5'UTR 
CDS 



3 ' UTR 

polyA^site 
BASE COUNT 528 a 

ORIGIN 

1 cgtcgccgtc 
61 cgcgcagaca 
121 gggctactgt 
181 cagcaggtcc 
241 gagatcccta 
301 atcctgaagg 
361 aggcacaaaa 



1600. .2566 
2532.. 2537 
811 c 



cccgtctcct 
tgggctgetc 
gcgctgtgct 
ttaagaacgt 
tccccttcta 
gegagaagee 
gcaacatcac 



695 g 

gccaggcgcg 
cgccaaagcg 
gggcgctgtc 
gcgcatcgac 
tctctccgtc 
geaggtgegg 
cttcaacaac 

FIG. 38A 



532 t 

gagccctgcg 
cgctgggctg 
atgatcgtga 
cccagtagcc 
tacttctttg 
gagegeggge 
aacgacaccg 



agecgegggt 
ccggggcgct 
tggtgccgtc 
tgtccttcaa 
aegtcatgaa 
cctacgtgta 
tgtccttcct 



gggccccagg 
gggcgtcgcg 
gctcatcaag 
catgtggaag 
ccccagcgag 
cagggagtcc 
cgagtaccgc 
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421 accttccagt tccagccctc caagtcccac 
481 aacatcctgg tcttgggtgc ggcggtgatg 
541 atcatgacct tggcattcac caccctcggc 
601 gagatcatgt ggggctacaa ggaccccctt 
661 atgttcccct tcaaggacaa gttcggatta 
721 ctcttcacgg tgttcacggg ggtccagaac 
781 aacgggctga gcaaggttga cttctggcat 
841 tctgggcaaa tgtggccgcc cttcatgact 
901 gaggcctgcc gatccatgaa gctaatgtac 
961 acctatcgct tcgtggctcc caaaaccctg 
1021 gaaggcttct gcccgtgcct ggagtctgga 
1081 gcccccttgt ttctctccca tcctcacttc 
1141 gtgactggcc tgcaccctaa ccaggaggca 
1201 acgggaatcc ccatgaactg ctctgtgaaa 
1261 gcaggcattg gacaaactgg gaagattgag 
1321 gagagcgggg ccatggaggg ggagactctt 
1381 cccaaggtga tgcactatgc ccagtacgtc 
1441 gtccctgtca tctgccaaat ccggagccaa 
1501 aaaaagggct caaaggataa ggaggccatt 
1561 gctcccaagg gctctgtgct gcaggaagca 
1621 cagccaggcc tggccgctgg gcctgaccgg 
1681 gactctccca gcagacagcc ccccagcccc 
1741 tgttgcacac ctgcacacac gccctggcac 
1801 acactcaggg atggagctgc tgctgaaggg 
1861 tgttctggaa ccttctctcc acgtggccca 
1921 gtccccttcc tcgggtgagc ctggcctgtc 
1981 ctccaaggtg aaacactgca gtcccggtgt 
2041 gggagtgccg ccttcctgtg ccaaattcag 
2101 gctttggcct tggtctacct gccaggccag 
2161 caatggagtg agcacaagat gccctgtgca 
2221 ggactttgat ccccccgaag tcttcacagg 
2281 ctccagccta aactgacatc atcctatgga 
2341 gcaggctgtg cccccgagct gcccccaccc 
2401 caggctgagg tgaagaggcc tgggggccct 
2461 aacctgtgac ccttttctac tggaatagaa 
2521 actcttgaag taataaacgt ttaaaaaaat 



ggctcggaga gcgactacat cgtcatgccc 
atggagaata agcccatgac cctgaagctc 
gaacgtgcct tcatgaaccg cactgtgggt 
gtgaatctca tcaacaagta ctttccaggc 
tttgctgagc tcaacaactc cgactctggg 
atcagcagga tccacctcgt ggacaagtgg 
tccgatcagt gcaacatgat caatggaact 
cctgagtcct cgctggagtt ctacagcccg 
aaggagtcag gggtgtttga aggcatcccc 
tttgccaacg ggtccatcta cccacccaac 
attcagaacg tcagcacctg caggttcagt 
ctcaacgccg acccggttct ggcagaagcg 
cactccttgt tcctggacat ccacccggtc 
ctgcagctga gcctctacat gaaatctgtc 
cctgtggtcc tgccgctgct ctggtttgca 
cacacattct acactcagct ggtgttgatg 
ctcctggcgc tgggctgcgt cctgctgctg 
gagaaatgct atttattttg gagtagtagt 
caggcctatt ctgaatccct gatgacatca 
aaactgtagg gtcctgagga caccgtgagc 
ccccccagcc cctacacccc gcttctcccg 
acagcctgag cctcccagct gccatgtgcc 
acatacacac atgcgtgcag gcttgtgcag 
acttgtaggg agaggctcgt caacaagcac 
caggctgacc acaggggctg tgggtcctgc 
ccgttcagcc gttgggccag gcttcctccc 
ggtggctccc catgcaggac gggccaggct 
tggggactca gtgcccaggc cctggcacga 
gcaaagcgcc tttacacagg cctcggaaaa 
gctgcccgag ggtctccgcc caccccggcc 
cactgcatcg ggttgtctgg cgcccttttc 
ctgagccggc cactctctgg ccgaagtggc 
cctcacaggg tccctcagat tataggtgcc 
gccttccggg cgctcctgga ccctggggca 
atgagtttta tcatctttga aaaataattc 
ggaaaaaaaa aaaaa 
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1- Claims: 1-12 (partially) 

INVENTION 1: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the AT3 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



2. Claims: 1-12 (partially) 

INVENTION 2: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the CETP gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



3. Claims: 1-12 (partially) 

INVENTION 3: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the CLanalog 
gene and comprising a polymorphic (biallelic) site according 
to the nucleotide positions as indicated in the attached 
table - column 4, an allele-specific oligonucleotide 
hybridizing to such a polymorphic site, an isolated gene 
product encoded by such a nucleic acid molecule, and a 
method of analyzing such a nucleic acid by determining the 
bases occupying the polymorphic site(s). 



4. Claims: 1-12 (partially) 

INVENTION 4: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the F2R gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



5. Claims: 1-12 (partially) 

INVENTION 5: A nucleic acid molecule of at least 5 
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nucleotides in length consisting of a part of the F2 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



6. Claims: 1-12 (partially) 

INVENTION 6: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the F3 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



7. Claims: 1-12 (partially) 

INVENTION 7: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the F5 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



8. Claims: 1-12 (partially) 

INVENTION 8: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the HCF2 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



9. Claims: 1-12 (partially) 

INVENTION 9: A nucleic acid molecule of at least 5 

nucleotides in length consisting of a part of the HM6CR gene 

and comprising a polymorphic (biallelic) site according to 

the nucleotide positions as indicated in the attached table 
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- column 4 t an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



10. Claims: 1-12 (partially) 

INVENTION 10: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the ITGA2B 
gene and comprising a polymorphic (bi allelic) site according 
to the nucleotide positions as indicated in the attached 
table - column 4, an allele-specific oligonucleotide 
hybridizing to such a polymorphic site, an isolated gene 
product encoded by such a nucleic acid molecule, and a 
method of analyzing such a nucleic acid by determining the 
bases occupying the polymorphic site(s). 



11. Claims: 1-12 (partially) 

INVENTION 11: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the ITB3 gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s) . 



12. Claims: 1-12 (partially) 

INVENTION 12: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the LCAT gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



13. Claims: 1-12 (partially) 

INVENTION 13: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the LDLR gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
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such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



14. Claims: 1-12 (partially) 

INVENTION 14: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the LPL gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



15. Claims: 1-12 (partially) 

INVENTION 15: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the PROC gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



16. Claims: 1-12 (partially) 

INVENTION 16: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the PTAFR gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 



17. Claims: 1-12 (partially) 

INVENTION 17: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the TFPI gene 
and comprising a polymorphic (biallelic) site according to 
the nucleotide positions as indicated in the attached table 
- column 4, an allele-specific oligonucleotide hybridizing 
to such a polymorphic site, an isolated gene product encoded 
by such a nucleic acid molecule, and a method of analyzing 
such a nucleic acid by determining the bases occupying the 
polymorphic site(s). 
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18. Claims: 1-12 (partially) 

INVENTION 18: A nucleic acid molecule of at least 5 
nucleotides in length consisting of a part of the TBXA2R 
gene and comprising a polymorphic (biallelic) site according 
to the nucleotide positions as indicated in the attached 
table - column 4, an allele-specific oligonucleotide 
hybridizing to such a polymorphic site, an isolated gene 
product encoded by such a nucleic acid molecule, and a 
method of analyzing such a nucleic acid by determining the 
bases occupying the polymorphic site(s). 
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